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Vaccines 



The present invention relates to fusion partners which act as immunological fusion 
partners, as expression enhancers, and preferably to fusion partners having both 
functions. The invention also relates to fusion proteins containing them, to their 
manufacture, to their use in vaccines and to their use in medicines. In particular fusion 
partners are provided that contain a so-called choline binding domain, for example fusions 
comprising LytA from Streptococcus pneumoniae, or the pneumococcal phage CP1 
lysozyme (CPL1) wherein the choline binding domain is modified to include a heterologous 
T-helper epitope. Such fusion partners are shown to improve the expression level of the 
heterologous protein attached thereto and also find particular utility when fused to poorly 
immunogenic proteins or peptides that are otherwise useful as vaccine antigens. More 
particularly, such fusion partners are useful in constructs comprising self-antigens, eg 
tumour specific or tissue specific antigens. 

Streptococcus pneumoniae synthesises an N acetyl-L-alanine amidase, LytA, an 
autolysin that specifically degrades the peptidoglycan backbone of the cell wall eventually 
leading to cell lysis. Its polypeptide chain has two domains. The N-terminal domain is 
responsible for the catalytic activity, whereas the C-terminal domain of LytA is responsible 
for the affinity to choline and anchorage to the cell wall. This C-terminal domain is known 
to bind to choline and choline analogues, and will also bind to tertiary amines such as 
DEAE (diethyl amino ethyl) commonly used in chromatography. 

LytA is a 318 amino acid protein, and the C-terminal part comprises a tandem of six 
imperfect repeats of 20 or 21 amino acids and a short COOH-terminal tail. The repeats are 
located at the following positions: 

R1: 177-191 

R2: 192-212 

R3: 213-234 

R4: 235-254 

R5: 255-275 

R6: 276-298 

These repeats are predicted to be in a beta-turn conformation. The C-terminus is 
responsible for binding choline. Likewise the C-terminus of CPL1 is responsible for binding 
affinity and the aromatic residues in the repeat contribute to such binding. These proteins 
have been used as affinity tags to allow for rapid purification (Sanchez Puelles, Eur J 
Biochem. 1992, 203, 153-9). 

Other proteins with a choline-binding domain have also been studied in 
Streptococcus pneumoniae. 
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One of them PspA (or Pneumococcal Surface Protein A), is a virulence factor 
(Yother J and Briles (1992) J Bacteriol 174(2) p 601). This protein is antigenic and 
immunogenic. It has a C-terminal domain consisting of 10 repeats of 20 amino acids, 
homologous with repeats of LytA. 

CbpA (or Choline-Binding Protein A) is involved in the adherence of the 
pneumococcus to human cells (Rosenow et al (1997) Mol Microbiol 25 (5) p 819). It 
shows 10 repeats of 20 amino acids in the C-terminal domain which are almost identical to 
those of PspA. 

LytB and LytC have a different modular organisation from the above-mentioned 
proteins as their choline-binding domain, made up of 15 repeats and 11 repeats 
respectively, is situated at the N-terminal end, not at the C-terminal end (Garcia P Mol 
Microbiol (1999) 31 (4) p1275 and Garcia P et al (1999) Mol Microbiol 33(1) p128). 
Sequence comparison shows LytB to have glucosamidase activity. LytC shows in vitro a 
lysozyme-type activity. 

Additionally, three genes called PepA, PepB and PepC were cloned in 1995. 
Although their function is unknown, these genes also have a variable number of repeats 
homologous to those of LytA. 

In their infection cycle, phages synthesise murein hydrolases facilitating their 
passage into the bacterium. These hydrolases have a choline-binding domain. 

The muramidase CPL1 of the phage Cp-1 has been well studied. It shows 6 
repeats of 20 amino acids at the C-terminus involved in the specific recognition of choline 
(Garica J. L J. Virol 61 (8) p2573-80; (1987) and Garcia E Prol Natl Acad Sci (1988) 
p914). A comparison of the LytA and CPL1 repeats enables an initial consensus of those 
repeats to be made. 

The murein hydrolases of phages Dp-1 (Garcia P et ai (1983) J Gen Microbiol 129 
(2) p489, Cpl-9 (Garcia P etal (1989) Biochem Biophys Res Commun 158(1) p 251, HB-3 
Romero et al 1990 J Bacteriol 172 (9) p 5064-5070) and EJ-1 Diaz (1992) J Bacteriol 174 
(17) p 5516), also show the characteristics of choline-binding domains. 

This property is also shared by the lysozyme encoded by CP-1 a pneumococal 

phage. 

WO 99/10375 describes inter alia, human papilloma virus proteins E6, or E7 linked 
to a His tag and the C-terminal portion of LytA (herein (C-LytA) and the purification of the 
proteins by differential affinity chromatography. 

WO 99/40188 describes inter alia fusion proteins comprising MAGE antigens with a 
His tails and a C-LytA portion at the N-terminus of the molecule. 

It has now been surprisingly found that fusion partners according to the present 
invention, when fused to a heterologous protein were capable of enhancing the 
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immunogenics of the heterologous proteins attached thereto. It has also been found that 
the expression level of the heterologous proteins attached thereto can be enhanced. The 
present invention accordingly provides in a preferred embodiment an improved 
immunological fusion partner which can also act as an expression enhancer. 

Accordingly the present invention comprises a fusion molecule comprising a 
choline binding domain or a fragment thereof or an analogue thereof, and a heterologous 
promiscuous MHC Class II T-epitope, wherein said fusion partner shows a capability of 
acting as both an immunological fusion partner, or as an expression enhancer and 
preferably as both an immunological partner and expression enhancer. A promiscuous T- 
helper epitope is an epitope that binds to more than one MHC Class II allele, preferably 
more than 3 MHC Class II alleles. In particular such epitopes are capable of eliciting helper 
T cell response in large numbers of individuals expressing diverse MHC haplotypes. 
Optionally, the fusion protein may retain its capability to bind to choline. 
In one embodiment of the present invention the modified choline binding domain 
(fusion partner) has a capability of acting as an expression enhancer with the resulting 
fusion protein will be expressed at a higher yield in a host cell as compared to the unfused 
protein, preferably at a yield greater than about 100% (2-fold higher) or 150% or more, as 
measured by SDS-PAGE followed by Coomassie blue staining or silver staining, optionally 
followed by gel scanning. The modified choline binding domain according to the invention 
has also the capability of acting as an immunological partner with the resulting fusion 
protein with a heterologous protein will be more immunogenic in a host as compared to the 
unfused heterologous protein. 

In another embodiment of the present invention, the modified choline binding 
domain has the capability to act as an immunological fusion partner, allowing an enhanced 
immune response to be obtained with the fusion protein as compared to the heterologous 
protein alone. 

In a preferred embodiment, the modified choline binding domain has a dual 
function, having the capability to act as both an immunological fusion partner and as an 
expression enhancer. 

In a preferred embodiment the choline binding moiety is derived from the C 
terminus of LytA. Preferably the C-LytA or derivatives comprises at least four repeats. In 
this context, C-LytA derivatives refer to a variant of C-LytA according to the present 
invention, that is to say variants which have retained both the capability of acting as an 
immunological partner and an expression enhancer. Preferred variants include, for 
example, peptides comprising an amino acid sequence having at least 85% identity, 
preferably at least 90% identity, more preferably at least 95% identity, most preferably at 
least 97-99% identity, to any of the repeats R1 to R6 set forth in figure 1 (SEQ ID NO:1 to 




6), or a peptide comprising an amino acid sequence having at least 15, 20, 30, 40, 50 or 
100 contiguous amino acids from the amino acid sequence set forth in figure 1 (SEQ ID 
NO:1 to 8). 



Accordingly, in one aspect of the invention there is provided a fusion partner 
protein comprising a modified choline binding domain and a heterologous promiscuous T 
helper epitope, wherein the choline binding domain is selected from the group comprising: 

a) the C-terminal domain of LytA as set forth in SEQ ID NO:7; 

b) the sequence of SEQ ID NO:8; 

c) a peptide sequence comprising an amino acid sequence having at least 85% 
identity, preferably at least 90% identity, more preferably at least 95% identity, 
most preferably at least 97-99% identity, to any of SEQ ID NO:1 to 6; 

d) a peptide sequence comprising an amino acid sequence having at least 15, 20, 
30, 40, 50 or 100 contiguous amino acids from the amino acid sequence of SEQ 
ID NO:7 or SEQ ID NO:8. 

In a most preferred embodiment, the C-LytA extends from amino acid 177-298 
which contains a portion of the first repeat and the complete five others, and is set forth in 
figure 1 . 

The second component of the fusion partner, the heterologous T-cell epitope is 
preferably selected from the group of epitopes that will bind to a number of individuals 
expressing more than one MHC II molecules in humans. For example, epitopes that are 
specifically contemplated are P2 and P30 epitopes from tetanus toxoid, Panina - 
Bordignon Eur. J. Immunol 19 (12), 2237 (1989). In a preferred embodiment the 
heterologous T-cell epitope is P2 or P30 from Tetanus toxin. 

The P2 epitope has the sequence QYIKANSKFIGITE and corresponds to amino 
acids 830-843 of the Tetanus toxin. 

The P30 epitope (residues 947-967 of Tetanus Toxin) has the sequence 
FNNFTVSFWLRVPKVSASHLE. The FNNFTV sequence may optionally be deleted. 
Other universal T epitopes can be derived from the circumsporozoite protein from 
Plasmodium falciparum - in particular the region 378-398 having the sequence 
DIEKKIAKMEKASSVFNWNS (Alexander J, (1994) Immunity 1 (9), p 751-761). 
Another epitope is derived from Measles virus fusion protein at residue 288-302 having the 
sequence LSE I KG VI VH RLEG V (Partidos CD, 1990, J. Gen. Virol 71(9) 2099-2105). 
Yet another epitope is derived from hepatitis B virus surface antigen, in particular amino 
acids, having the sequence FFLLTRILTIPQSLD. 
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Another set of epitopes is derived from diphteria toxin. Four of these peptides 
(amino acids 271-290, 321-340, 331-350, 351-370) map within the T domain of fragment B 
of the toxin, and the remaining 2 map in the R domain (41 1-430, 431-450): 
PVFAGANYAAWAVNVAQVI 
VHHNTEEIVAQSIALSSLMV 
QSIALSSLMVAQAIPLVGEL 
VDIGFAAYNFVESII NLFQV 
QGESGHDIKITAENTPLPIA 
GVLLPTIPGKLDVNKSKTH I 

(Raju R., Navaneetham D., Okita D., Diethelm-Okita B., McCormick D., Conti-Fine B. M. 
(1995) Eur. J. Immunol. 25: 3207-14.) 

The heterologous T-epitope is preferably fused to C-LytA containing at least 4 
repeats, preferably repeat 2 -5 inclusive. One or more subsequent repeats may optionally 
be fused to the C-terminus of the T-epitope. 

Alternatively, the heterologous T-epitope is preferably inserted between two 
consecutive repeats of C-LytA containing a total of at least 4 repeats, or inserted into one 
of the repeats of C-LytA containing a total of at least 4 repeats. More preferably, the C- 
LytA contains 6 repealts and the heterologous epitope is inserted within and at the 
beginning of the sixth repeat of C-LytA. 

The present invention further provides, in other aspects, fusion proteins that 
comprise at least one polypeptide as described above, as well as polynucleotides 
encoding such fusion proteins, typically in the form of pharmaceutical compositions, e.g., 
vaccine compositions, comprising a physiologically acceptable carrier and/or an 
immunostimulant. 

Thus a self-protein or other poorly immunogenic protein may be fused to either the 
N or C terminal end of the resulting fusion partner. Alternatively the self protein or poorly 
immunogenic protein may be inserted into the fusion partner. In an optional embodiment a 
histidine tag or at least four, preferably more than 6 histidine residues, may be fused to the 
alternative end of the poorly immunogenic protein. This would allow for the protein to be 
purified by affinity chromatography steps, as a histidine tail, typically comprising at least 
four, preferably six or more residues binds to metal ions and therefore is suitable for metal 
immobilised metal ion affinity chromatography (IMAC). 

Typical constructs would therefore comprise: 

Poorly- immunogenic prdfc-LytA repeats^ -P 2 epitope (inserted in or replacing C- 
LytA repeat 5 )-C-LytA repeat 6 

C-LytA repeats^ -P 2 efwtserted in or replacing C-LytA repeats) - C-LytA repeate- 
Pooriy immunogenic protein 




Poorly immunogenicrprfittLytA repeat^ -P 2 epitope (inserted into C-LytA repeal 
C-LytA 2 -5 -P 2 epitopefe@iato C-LytA repeal)- Poorly immunogenic protein. 



Poorly immunogenicrjSfctij*A repeats 1 _ 5 -P 2 epitope- inserted in C-LytA repeats 
C-LytA repeats^-Pz epri^erted in C-LytA repeal- Poorly immunogenic protein 
Poorly immunogenicrpFfetepitope inserted into C-LytA repeat,-C-LytA repeats^ 
P 2 epitope inserted iijt&CfepeatrC-LytA repeats 2 . 5 - Poorly immunogenic protein 
Poorly immunogenicrpFfctepitope inserted into C-LytA repeat,-C-LytA repeats^ 
P 2 epitope inserted itjt&fepeaVC-LytA repeats 2 . 6 - Poorly immunogenic protein 
Poorly immunc^enicr^BdtptA repeat^Pz epitope inserted into C-LytA repeat 2 -C-LytA 
repeats^ 

C-LytA repeat r P 2 eptoprfed into C-LytA repeat 2 -C-LytA repeats^- Poorly 
immunogenic protein; 

where "inserted into" means at any place into the repeat for example between residue 1 
and 2, or between 2 and 3, etc. 

The promiscuous T helper epitope may be inserted within a repeat region for 
example C-LytA repeats 2 ^ _ - C-LytA repeat 6a-P 2 epitope - C-LytA repeat 6b, where the 
P2 epitope is inserted within the sixth repeat (see figure 2). 

In other preferred embodiments the C-terminal end of CPL1 (C-CPL1) may be used 
as an alternative to C-LytA. 

Alternatively, the P2 epitope in the above constructs may be replaced by other 
promiscuous T epitopes, for example P30. In an embodiment of the invention, two or more 
promiscuous epitopes are part of the fusion construct. It is however preferred to keep the 
fusion partner as small as possible, thus limiting the number of potentially interfering CD8+ 
and B epitopes. Thus the fusion partner is preferably no bigger than 100-140 amino acids, 
preferably bo bigger than 120 amino acids, typically about 100 amino acid. 

The antigen to which the fusion partner is fused may be from bacterial, viral, 
protozoan, fungal or mammalian, including human, sources. 

The fusion partner of the present invention are preferably fused to a self antigen 
such as a tumour associated or tissue specific antigens such as those for prostrate, breast, 
colorectal, lung, pancreatic, ovarian, renal or melanoma cancers. Fragments of said self or 
tumour antigens are expressly contemplated to be fused to the fusion partner of the 
invention. Typically the fragment will contain at least 20, preferably 50, more preferably 
100 contiguous amino acids of the full-length sequence. Typically such fragments will be 
devoid of one or more transmembrane domains or may have N-terminal or C-terminal 
deletions of about 3, 5 , 8, 10, 15, 20, 28 , 33, 50, 54 amino acids. Such fragments will, 
when suitably presented, be able to generate immune responses that recognise the full 
length protein. 
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Particularly illustrative polypeptides of the present invention comprise a sequence 
of at least 10 contiguous amino acids, preferably 20, more preferably 30, 40, 50, 60, 70. 
80, 90. 100, 110. 120. 130. 140. 150. 160. 170. 180 amino acids of a tumour associated or 
tissue specific protein fused to the fusion partner. 

The polypeptides of the invention are immunogenic, i.e.. they react detectably 
within an immunoassay (such as an ELISA or T-cell stimulation assay) with antisera and/or 
T-cells from a patient with cripto expressing cancer. Screening for immunogenic activity 
can be performed using techniques well known to the skilled artisan. For example, such 
screens can be performed using methods such as those described in Harlow and Lane, 
Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, 1988. In one illustrative 
example, a polypeptide may be immobilised on a solid support and contacted with patient 
sera to allow binding of antibodies within the sera to the immobilised polypeptide. 
Unbound sera may then be removed and bound antibodies detected using, for example. 
125 l-labeled Protein A. 

As would be recognised by the skilled artisan, immunogenic portions of tumour 
associated or tumour specific antigen are also encompassed by the present invention. An 
"immunogenic portion" as used herein, is a fragment that itself is immunologically reactive 
(i.e.. specifically binds) with the B-cells and/or T-cell surface antigen receptors that 
recognize the polypeptide. Immunogenic portions may generally be identified using well 
known techniques, such as those summarized in Paul, Fundamental Immunology, 3rd ed., 
243-247 (Raven Press, 1993) and references cited therein. Such techniques include 
screening polypeptides for the ability to react with antigen-specific antibodies, antisera 
and/or T-cell lines or clones. As used herein, antisera and antibodies are "antigen- 
specific" if they specifically bind to an antigen (i.e.. they react with the protein in an ELISA 
or other immunoassay, and do not react detectably with unrelated proteins). Such antisera 
and antibodies may be prepared as described herein, and using well-known techniques. 

In one preferred embodiment, an immunogenic portion of a polypeptide is a portion 
that reacts with antisera and/or T-cells at a level that is not substantially less than the 
reactivity of the full-length polypeptide (e.g.. in an ELISA and/or T-cell reactivity assay). 
Preferably, the level of immunogenic activity of the immunogenic portion is at least about 
50%. preferably at least about 70% and most preferably greater than about 90% of the 
immunogenicity for the full-length polypeptide. In some instances, preferred immunogenic 
portions will be identified that have a level of immunogenic activity greater than that of the 
corresponding full-length polypeptide, e.g., having greater than about 100% or 150% or 
more immunogenic activity. 
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In certain other embodiments, illustrative immunogenic portions may include 
peptides in which an N-terminal leader sequence and/or transmembrane domain have 
been deleted. Other illustrative immunogenic portions will contain a small N- and/or C- 
terminal deletion (e.g., about 1-50 amino acids, preferably about 1-30 amino acids, more 
preferably about 5-15 amino acids), relative to the mature protein. 



Exemplary antigens or fragments derived therefrom include MAGE 1, Mage 3 and 
MAGE 4 or other MAGE antigens such as disclosed in WO99/40188, PRAME (WO 
96/10577), BAGE, RAGE, LAGE (also known as NY-ESO-1) SAGE and HAGE (WO 
99/53061) or GAGE (Robbins and Kawakami, 1996, Current Opinions in Immunology 8, 
pps 628-636; Van den Eynde et al., International Journal of Clinical & Laboratory Research 
(submitted 1997); Correale et al. (1997), Journal of the National Cancer Institute 89, p293. 
Indeed these antigens are expressed in a wide range of tumour types such as melanoma, 
lung carcinoma, sarcoma and bladder carcinoma. 

In a preferred embodiment prostate antigens are utilised, such as Prostate specific 
antigen (PSA), PAP, PSCA (PNAS 95(4) 1735 -1740 1998), PSMA or the antigen known 
as prostase. 

In a particularly preferred embodiment, the prostate antigen is P501S or a fragment 
thereof. P501S, also named prostein (Xu et al., Cancer Res. 61, 2001, 1563-1568), is 
known as sequence ID no 113 of W098/37814 and is a 553 amino acid protein. 
Immunogenic fragments and portions thereof comprising at least 20, preferably 50, more 
preferably 100 contiguous amino acids as disclosed in the above referenced patent 
application and are specifically contemplate by the present invention. Preferred fragments 
are disclosed in WO 98/50567 (PS108 antigen) and as prostate cancer-associated protein 
(WO 99/67384 SEQ ID NO: 9). Other preferred fragments are amino acids 51-553, 34-553 
or 55-553 of the full-length P501S protein. 

In particular, construct 1 , 2 and 3 (see figure 2) are expressly contemplated, and 
can be expressed in yeast systems, for example DNA sequences encoding such 
polypeptides can be expressed in yeast system. 

Prostase is a prostate-specific serine protease (trypsin-like), 254 amino acid-long, 
with a conserved serine protease catalytic triad H-D-S and a amino-terminal pre- 
propeptide sequence, indicating a potential secretory function (P. Nelson, Lu Gan, C. 
Ferguson, P. Moss, R. linas, L. Hood & K. Wand, "Molecular cloning and characterisation 
of prostase, an androgen-regulated serine protease with prostate restricted expression, In 
Proc. Natl. Acad. Sci. USA (1999) 96, 3114-3119). A putative glycosylation site has been 
described. The predicted structure is very similar to other known serine proteases, 
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showing that the mature polypeptide folds into a single domain. The mature protein is 224 
amino acids-long, with one A2 epitope shown to be naturally processed. 

Prostase nucleotide sequence and deduced polypeptide sequence and 
homologous are disclosed in Ferguson, etal. (Proc. Natl. Acad. Sci. USA 1999, 96, 3114- 
3119) and in International Patent Applications No. WO 98/12302 (and also the 
corresponding granted patent US 5.955,306), WO 98/20117 (and also the corresponding 
granted patents US 5,840,871 and US 5.786.148) (prostate-specific kallikrein) and WO 
00/04149 (P703P). 

Other prostate specific antigens are known from W098/37418. and WO/004149. 
Another is STEAP (PNAS 96 14523 14528 7 -12 1999). 

Other tumour associated antigens useful in the context of the present invention 
include: Plu -1 J Biol. Chem 274 (22) 15633-15645. 1999. HASH -1, HASH-2 (Alders.M. 
et al.. Hum. Mol. Genet. 1997, 6, 859-867), Cripto (Salomon et al Bioessays 199. 21 61 - 
70.US patent 5654140). CASB616 (WO 00/53216). Criptin (US 5.981.215). Additionally, 
antigens particularly relevant for vaccines in the therapy of cancer also comprise 
tyrosinase, telomerase and survivin. 

The present invention is also useful in combination with breast cancer antigens 
such as Her-2/neu. ma'mmaglobin (US patent 5668267) or those disclosed in WO/00 
52165. W099/33869. W099/19479. WO 98/45328. Her-2/neu antigens are disclosed inter 
alia, in US patent 5.801.005. Preferably the Her-2/neu comprises the entire extracellular 
domain (comprising approximately amino acid 1 -645) or fragments thereof and at least an 
immunogenic portion of or the entire intracellular domain approximately the C terminal 580 
amino acids. In particular, the intracellular portion should comprise the phosphorylation 
domain or fragments thereof. Such constructs are disclosed in WOOO/44899. A 
particularly preferred construct is known as ECD PD a second is known as ECD deltaPD 
(see WO/00/44899). 

The Her-2/neu as used herein can be derived from rat, mouse or human. 
Certain tumour antigens are small peptide antigens (ie less than about 50 amino 
acids). These antigens can be chemically conjugated to the modified choline binding 
protein of the present invention. 

Exemplary peptides included Mucin derived peptides such as Mud see for 
example US 5744.144 US 5827. 666 WO 8805054. US 4,963.484. Specifically 
contemplated are Muc 1 derived peptides that comprise at least one repeat unit of the Muc 
1 peptide, preferably at least two such repeats and which is recognised by the SM3 
antibody (US 6 054 438). Other mucin derived peptides include peptide from Muc 5. 

Alternatively, said antigen is an interleukin such as IL13 and IL14, which are 
preferred. Or said antigen maybe a self peptide hormone such as whole length 
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Gonadotropin hormone releasing hormone (GnRH, WO 95/20600), a short 10 amino acid 
long peptide, useful in the treatment of many cancers, or in immunocastration. 

Other tumour-specific antigens are suitable to be coupled with the modified Choline 
binding protein of the present invention include, but are not restricted to tumour-specific 
gangliosides such as GM2, and GM3. 

The covalent coupling of the peptide to modified choline binding protein can be 
carried out in a manner well known in the art. Thus, for example, for direct covalent 
coupling it is possible to utilise a carbodiimide, glutaraldehyde or (N-[y- 
maleimidobutyryloxy] succinimide ester, utilising common commercially available 
heterobifunctional linkers such as CDAP and SPDP (using manufacturers instructions). 
After the coupling reaction, the immunogen can easily be isolated and purified by means of 
a dialysis method, a gel filtration method, a fractionation method etc. 

The antigen may also be derived from sources which are pathogenic to humans, 
such as such as HIV-1 (such as tat, nef, reverse transcriptase, gag, gp120 and gp160), 
human herpes viruses, such as gD or derivatives thereof or Immediate Early protein such 
as ICP27 from HSV1 or HSV2, cytomegalovirus ((esp Human)(such as gB or derivatives 
thereof), Rotavirus (including live-attenuated viruses), Epstein Barr virus (such as gp350 or 
derivatives thereof), Varicella Zoster Virus (such as gpl, II and IE63), or from a hepatitis 
virus such as hepatitis B virus (for example Hepatitis B Surface antigen or a derivative 
thereof), hepatitis A virus, hepatitis C virus and hepatitis E virus, or from other viral 
pathogens, such as paramyxoviruses: Respiratory Syncytial virus (such as F and G 
proteins or derivatives thereof), parainfluenza virus, measles virus, mumps virus, human 
papilloma viruses (for example HPV6, 11, 16, 18, ..), flaviviruses (e.g. Yellow Fever Virus, 
Dengue Virus, Tick-borne encephalitis virus, Japanese Encephalitis Virus) or Influenza 
virus (whole live or inactivated virus, split influenza virus, grown in eggs or MDCK cells, or 
whole flu virosomes (as described by R. Gluck, Vaccine, 1992, 10, 915-920) or purified or 
recombinant proteins thereof, such as HA, NP, NA, or M proteins, or combinations 
thereof), or derived from bacterial pathogens such as Neisseria spp, including N. 
gonorrhea and N. meningitidis (for example capsular polysaccharides and conjugates 
thereof, transferrin-binding proteins, lactoferrin binding proteins, PilC, adhesins); S. 
pyogenes (for example M proteins or fragments thereof, C5A protease, lipoteichoic acids), 
S. agalactiae, S. mutans; H. ducreyi; Moraxeiia spp, including M catarrhalis, also known as 
Branhamella catarrhalis (for example high and low molecular weight adhesins and 
invasinsj; Bordetella spp, including B. pertussis (for example pertactin, pertussis toxin or 
derivatives thereof, filamenteous hemagglutinin, adenylate cyclase, fimbriae), B. 
parapertussis and B. bronchiseptica; Mycobacterium spp., including M. tuberculosis (for 
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example ESAT6, Antigen 85A. -B or -C), M. bovis, M. leprae, M. avium, M. 
paratuberculosis, M. smegmatis; Legionella spp, including L pneumophila; Escherichia 
spp, including enterotoxic E. coli (for example colonization factors, heat-labile toxin or 
derivatives thereof, heat-stable toxin or derivatives thereof), enterohemorragic E. coli, 
enteropathogenic E. coli (for example Shiga toxin-like toxin or derivatives thereof); Vibrio 
spp, including V. cholera (for example cholera toxin or derivatives thereof); Shigella spp, 
including S. sonnei, S. dysenteriae, S. flexnerii; Yersinia spp, including Y. enterocolitica 
(for example a Yop protein) , Y. pestis, Y. pseudotuberculosis; Campylobacter spp, 
including C. jejuni (for example toxins, adhesins and invasins) and C. coli; Salmonella spp, 
including S. typhi, S. paratyphi, S. choleraesuis, S. enteritidis; Listeria spp., including L. 
monocytogenes; Helicobacter spp, including H. pylori (for example urease, catalase, 
vacuolating toxin); Pseudomonas spp, including P. aeruginosa; Staphylococcus spp., 
including S. aureus, S. epidermidis; Enterococcus spp., including E. faecalis, E. faecium; 
Clostridium spp., including C. tetani (for example tetanus toxin and derivative thereof). C. 
botulinum (for example botulinum toxin and derivative thereof), C. difficile (for example 
Clostridium toxins A or B and derivatives thereof); Bacillus spp., including B. anthracis (for 
example botulinum toxin and derivatives thereof;; Corynebacterium spp., including C. 
diphtheriae (for example diphtheria toxin and derivatives thereof); Borrelia spp., including 

B. burgdorferi (for example OspA. OspC, DbpA, DbpB), B. garinii (for example OspA, 
OspC, DbpA, DbpB). B. afzelii (for example OspA, OspC, DbpA. DbpB;, B. andersonii (for 
example OspA, OspC, DbpA, DbpB), B. hermsii; Ehrlichia spp., including E. equi and the 
agent of the Human Granulocytic Ehrlichiosis; Rickettsia spp, including R rickettsii; 
Chlamydia spp., including C. trachomatis (for example MOMP, heparin-binding proteins), 

C. pneumoniae (for example MOMP, heparin-binding proteins;, C. psittaci; Leptospira 
spp., including L. interrogans; Treponema spp., including T. pallidum (for example the rare 
outer membrane proteins;, T. denticola, T. hyodysenteriae; or derived from parasites such 
as Plasmodium spp., including P. falciparum; Toxoplasma spp., including T. gondii (for 
example SAG2, SAG3, Tg34); Entamoeba spp., including £. histolytica; Babesia spp., 
including B. microti; Trypanosoma spp., including T. cruzi; Giardia spp., including G. 
lamblia; Leshmania spp., including L. major, Pneumocystis spp., including P. carinii; 
Trichomonas spp., including T. vaginalis; Schisostoma spp., including S. mansoni, or 
derived from yeast such as Candida spp., including C. albicans; Cryptococcus spp., 
including C. neoformans. 

Other preferred specific antigens for M. tuberculosis are for example Tb Ra12, Tb 
H9. Tb Ra35. Tb38-1. Erd 14, DPV, MTI, MSL. mTTC2 and hTCC1 (WO 99/51748). 
Proteins for M. tuberculosis also include fusion proteins and variants thereof where at least 



11 




two, preferably three polypeptides of M. tuberculosis are fused into a larger protein. 
Preferred fusions include Ra12-TbH9-Ra35, Erd14-DPV-MTI, DPV-MTI-MSL, Erd14-DPV- 
MTI-MSL-mTCC2, Erd14-DPV-MTl-MSL, DPV-MTI-MSL-mTCC2, TbH9-DPV-MTI (WO 
99/51748). 



Most preferred antigens for Chlamydia include for example the High Molecular 
Weight Protein (HWMP) (WO 99/17741), ORF3 (EP 366 412), and putative membrane 
proteins (Pmps). Other Chlamydia antigens of the vaccine formulation can be selected 
from the group described in WO 99/28475. 

Preferred bacterial antigens are derived from Streptococcus spp, including S. 
pneumoniae (for example capsular polysaccharides and conjugates thereof, PsaA, PspA, 
streptolysin, choiine-btnding proteins) and the protein antigen Pneumoiysin (Biochem 
Biophys Acta, 1989, 67, 1007; Rubins et ah, Microbial Pathogenesis, 25, 337-342), and 
mutant detoxified derivatives thereof (WO 90/06951; WO 99/03884). Other preferred 
bacterial antigens are derived from Haemophilus spp., including H. influenzae type B (for 
example PRP and conjugates thereof), non typeable H. influenzae, for example OMP26, 
high molecular weight adhesins, P5, P6, protein D and lipoprotein D, and fimbrin and 
fimbrin derived peptides (US 5,843,464) or multiple copy varients or fusion proteins 
thereof. 

Derivatives of Hepatitis B Surface antigen are well known in the art and include, 
inter alia, those PreS1, PreS2 S antigens set forth described in European Patent 
applications EP-A-414 374; EP-A-0304 578, and EP 198-474. In one preferred The HBV 
antigen is HBV polymerase (Ji Hoon Jeong et al , 1996, BBRC 223, 264-271; Lee H.J. et 
al , BiotechnoL Lett. 15, 821-826). In another preferred aspect the antigen within the fusion 
is a HIV-1 antigen, gp120, especially when expressed in CHO cells. In a further 
embodiment, antigen comprises gD2t as hereinabove defined. 

In a preferred embodiment of the present invention fusions comprise an antigen 
derived from the Human Papilloma Virus (HPV 6a, 6b, 11, 16, 18, 31, 33, 35, 39, 45, 51, 
52, 56, 58, 59 and 68), in particular those HPV serotypes considered to be responsible for 
genital warts (HPV 6 or HPV 11 and others;, and the HPV viruses responsible for cervical 
cancer (HPV16, HPV18 and others;. 

Suitable HPV antigens are E1, E2, E4 f E5, E6, E7, L1 and L2. Particularly 
preferred forms of genital wart prophylactic, or therapeutic, fusions comprise L1 particles 
or capsomers, and fusion proteins comprising one or more antigens selected from the HPV 
6 and HPV 11 proteins E6, E7, L1, and L2. 

The most preferred forms of fusion protein are: L2E7 as disclosed in WO 96/26277, 
and proteinD(1/3)-E7 disclosed in GB 9717953.5 (PCT/EP98/05285). 
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A preferred HPV cervical infection or cancer, prophylaxis or therapeutic vaccine, 
composition may comprise HPV 16 or 18 antigens. For example, L1 or L2 antigen 
monomers, or L1 or L2 antigens presented together as a virus like particle (VLP) or the L1 
alone protein presented alone in a VLP or caposmer structure. Such antigens, virus like 
particles and capsomer are per se known. See for example WO94/00152. WO94/20137. 
WO94/05792, and WO93/02184. 

Additional early proteins may be included alone or as fusion proteins such as E7, 
E2 or preferably E5 for example; particularly preferred embodiments of this includes a VLP 
comprising L1 E7 fusion proteins (WO 96/1 1 272). 

Particularly preferred HPV 16 antigens comprise the early proteins E6 or E7 in 
fusion with a protein D carrier to form Protein D - E6 or E7 fusions from HPV 16, or 
"combinations thereof; or combinations of E6 or E7 with L2 (WO 96/26277). 

Alternatively the HPV 16 or 18 early proteins E6 and E7. may be presented in a 
single molecule, preferably a Protein D- E6/E7 fusion. Other fusions optionally contain 
either or both E6 and E7 proteins from HPV 18, preferably in the form of a Protein D - E6 
or Protein D - E7 fusion protein or Protein D E6/E7 fusion protein. Fusions may comprise 
antigens from other HPV strains, preferably from strains HPV 31 or 33. 

Fusions according to the present invention comprise antigens derived from 
parasites that cause Malaria. For example, preferred antigens from Plasmodia falciparum 
include RTS.S and TRAP. RTS is a hybrid protein comprising substantially all the C- 
terminal portion of the circumsporozoite (CS) protein of P.faiciparum linked via four amino 
acids of the preS2 portion of Hepatitis B surface antigen to the surface (S) antigen of 
hepatitis B virus. Its full structure is disclosed in the International Patent Application No. 
PCT/EP92/02591. published under Number WO 93/10152 claiming priority from UK patent 
application No.91 24390.7. When expressed in yeast RTS is produced as a lipoprotein 
particle, and when it is co-expressed with the S antigen from HBV it produces a mixed 
particle known as RTS.S. TRAP antigens are described in the International Patent 
Application No. PCT/GB89/00895, published under WO 90/01496. A preferred 
embodiment of the present invention is a fusion wherein the antigenic preparation 
comprises a combination of the RTS.S and TRAP antigens. Other Plasmodia antigens that 
are likely candidates to be components of the fusion are P. faciparum MSP1. AMA1, 
MSP3, EBA, GLURP, RAP1, RAP2, Sequestrin, PfEMPI, Pf332, LSA1. LSA3. STARP, 
SALSA. PfEXPL Pfs25. Pfs28. PFS27/25. Pfs16, Pfs48/45. Pfs230 and their analogues in 
Plasmodium spp. 

The present invention also provides a polynucleotide encoding the fusion partner 
according to the present invention. The invention further relates a polynucleotide that 



13 




hybridise to the polynucleotide sequence provided herein in figure 1 (SEQ ID NO:9 to 16). 
In this regard, the invention especially relates to polynucleotides that hybridise under 
stringent conditions to the polynucleotide described herein. As herein used, the terms 
"stringent conditions" and "stringent hybridisation conditions" mean hybridisation occurring 
only if there is at least 95% and preferably at least 97% identity between the sequences. A 
specific example of stringent hybridization conditions is overnight incubation at 42°C in a 
solution comprising: 50% formamide, 5x SSC (150mM NaCI, 15mM trisodium citrate), 50 
mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 
micrograms/ml of denatured, sheared salmon sperm DNA, followed by washing the 
hybridisation support in 0.1 x SSC at about 65°C. Hybridisation and wash conditions are 
well known and exemplified in Sambrook, et a/., Molecular Cloning: A Laboratory Manual, 
Second Edition, Cold Spring Harbor, N.Y., (1989), particularly Chapter 11 therein. Solution 
hybridisation may also be used with the polynucleotide sequences provided by the 
invention. 

The present invention also provides a polynucleotide encoding the polypeptide 
comprising the fusion partner according to the present invention fused to a tumour 
associated antigen or fragment thereof. 

Such polynucleotide sequences can be inserted into a suitable expression vector 
and expressed in a suitable host. Vectors may be provided which encode the modified 
choline binding protein of the invention and which contain a suitable restriction site into 
which a DNA encoding a poorly immunogenic protein can be inserted to produce a fusion 
protein. 

In other embodiments of the invention, polynucleotide sequences or fragments 
thereof which encode polypeptide fusions of the invention, may be used in recombinant 
DNA molecules to direct expression of a polypeptide in appropriate host cells. Due to the 
inherent degeneracy of the genetic code, other DNA sequences that encode substantially 
the same or a functionally equivalent amino acid sequence may be produced and these 
sequences may be used to clone and express a given polypeptide. 

As will be understood by those of skill in the art, it may be advantageous in some 
instances to produce polypeptide-encoding nucleotide sequences possessing non- 
naturally occurring codons. The DNA code has 4 letters (A, T, C and G) and uses these to 
spell three letter "codons" which represent the amino acids the proteins encodes in an 
organism's genes. The linear sequence of codons along the DNA molecule is translated 
into the linear sequence of amino acids in the protein(s) encoded by those genes. The 
code is highly degenerate, with 61 codons coding for the 20 natural amino acids and 3 
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codons representing "stop" signals. Thus, most amino acids are coded for by more than 
one codon - in fact several are coded for by four or more different codons. 

Where more than one codon is available to code for a given amino acid, it has 
been observed that the codon usage patterns of organisms are highly non-random. 
Different species show a different bias in their codon selection and, furthermore, utilisation 
of codons may be markedly different in a single species between genes which are 
expressed at high and low levels. This bias is different in viruses, plants, bacteria and 
mammalian cells, and some species show a stronger bias away from a random codon 
selection than others. For example, humans and other mammals are less strongly biased 
than certain bacteria or viruses. For these reasons, there is a significant probability that a 
mammalian gene expressed in E.coli or a viral gene expressed in mammalian cells will 
have an inappropriate distribution of codons for efficient expression. It is believed that the 
presence in a heterologous DNA sequence of clusters of codons which are rarely 
observed in the host in which expression is to occur, is predictive of low heterologous 
expression levels in that host. 

In consequence, codons preferred by a particular prokaryotic (for example E. coli or 
yeast) or eukaryotic host can be optimised, that is selected to increase the rate of protein 
expression, to produce a recombinant RNA transcript having desirable properties, such as 
for example a half-life which is longer than that of a transcript generated from the naturally 
occurring sequence, or to optimise the immune response in humans. The process of 
codon optimisation may include any sequence, generated either manually or by computer 
software, where some or all of the codons of the native sequence are modified. Several 
method have been published (Nakamura etal.. Nucleic Acids Research 1996. 24:214-215; 
WO98/34640). One preferred method according to this invention is Syngene method, a 
modification of Calcgene method (R. S. Hale and G Thompson (Protein Expression and 
Purification Vol. 12 pp.1 85-1 88 (1998)). 

This process of codon optimisation may have some or all of the following benefits: 
1) to improve expression of the gene product by replacing rare or infrequently used codons 
with more frequently used codons, 2) to remove or include restriction enzyme sites to 
facilitate downstream cloning and 3) to reduce the potential for homologous recombination 
between the insert sequence in the DNA vector and genomic sequences and 4) to improve 
the immune response in humans. Due to the nature of the algorithms used by the 
SynGene programme to generate a codon optimised sequence, it is possible to generate 
an extremely large number of different codon optimised sequences which will perform a 
similar function. In brief, the codons are assigned using a statistical method to give 
synthetic gene having a codon frequency closer to that found naturally in highly expressed 
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E.coli and human genes. Illustrative, although non limiting, examples of suitable codon- 
optimised are given in SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:24 and SEQ ID NO:25. 



A DNA sequence encoding the fusion proteins or modified choline binding protein 
of the present invention can be synthesised using standard DNA synthesis techniques, 
such as by enzymatic ligation as described by D.M. Roberts et al. in Biochemistry 1985, 
24, 5090-5098, by chemical synthesis, by in vitro enzymatic polymerisation, or by PCR 
technology utilising for example a heat stable polymerase, or by a combination of these 
techniques. 

Enzymatic polymerisation of DNA may be carried out in vitro using a DNA 
polymerase such as DNA polymerase I (Klenow fragment) or Taq polymerase in an 
appropriate buffer containing the nucleoside triphosphates dATP, dCTP, dGTP and dTTP 
as required at a temperature of 10°-37°C, generally in a volume of 50pl or less. Enzymatic 
ligation of DNA fragments may be carried out using a DNA ligase such as T4 DNA ligase 
in an appropriate buffer, such as 0.05M Tris (pH 7.4), 0.01 M MgCI 2 , 0.01 M dithiothreitol, 
1mM spermidine, 1mM ATP and 0.1mg/ml bovine serum albumin, at a temperature of 4°C 
to ambient, generally in a volume of 50 pi or less. The chemical synthesis of the DNA 
polymer or fragments may be carried out by conventional phosphotriester, phosphate or 
phosphoramidite chemistry, using solid phase techniques such as those described in 
'Chemical and Enzymatic Synthesis of Gene Fragments - A Laboratory Manual' (ed. H.G. 
Gassen and A. Lang), Verlag Chemie, Weinheim (1982), or in other scientific publications, 
for example MJ. Gait, H.W.D. Matthes, M. Singh, B.S. Sproat, and R.C. Titmas, Nucleic 
Acids Research, 1982, 10, 6243; B.S. Sproat, and W. Bannwarth, Tetrahedron Letters, 
1983, 24, 5771; M.D. Matteucci and M.H. Caruthers, Tetrahedron Letters, 1980, 21, 719; 
M.D. Matteucci and M.H. Caruthers, Journal of the American Chemical Society, 1981, 103, 
3185; S.P. Adams et al, Journal of the American Chemical Society, 1983, 105, 661; N.D. 
Sinha, J. Biernat, J. McMannus, and H. Koester, Nucleic Acids Research, 1984, 12, 4539; 
and H.W.D. Matthes et al., EMBO Journal, 1984, 3, 801. 

The process of the invention may be performed by conventional recombinant 
techniques such as described in Maniatis et al., Molecular Cloning - A Laboratory Manual; 
Cold Spring Harbor, 1982-1989. 

In particular, the process may comprise the steps of : 

i) preparing a replicable or integrating expression vector capable, in a host 
cell, of expressing a DNA polymer comprising a nucleotide sequence that encodes the 
protein or 

an immunogenic derivative thereof 

ii) transforming a host cell with said vector 
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iii) culturing said transformed host cell under conditions permitting expression 
of said DNA polymer to produce said protein; and 

iv) recovering said protein 

The term transforming' is used herein to mean the introduction of foreign DNA into 
a host cell. This can be achieved for example by transformation, transfection or infection 
with an appropriate plasmid or viral vector using e.g. conventional techniques as described 
in Genetic Engineering; Eds. S.M. Kingsman and AJ. Kingsman; Blackwell Scientific 
Publications; Oxford, England, 1988. The term transformed 1 or transformant' will 
hereafter apply to the resulting host cell containing and expressing the foreign gene of 
interest 

The expression vectors are novel and also form part of the invention. 

The replicable expression vectors may be prepared in accordance with the 
invention, by cleaving a vector compatible with the host cell to provide a linear DNA 
segment having an intact replicon, and combining said linear segment with one or more 
DNA molecules which, together with said linear segment encode the desired product, such 
as the DNA polymer encoding the protein of the invention, or derivative thereof, under 
ligating conditions. 

Thus, the DNA polymer may be performed or formed during the construction of the 
vector, as desired. 

The choice of vector will be determined in part by the host cell, which may be 
prokaryotic or eukaryotic but are preferably E. co//, yeast or CHO cells. Suitable vectors 
include plasmids, bacteriophages, cosmids and recombinant viruses. Expression and 
cloning vectors preferably contain a selectable marker such that only the host cells 
expressing the marker will survive under selective conditions. Selection genes include but 
are not limited to the one encoding protein that confer a resistance to ampicillin, tetracyclin 
or kanamycin. Expression vectors also contain control sequences which are compatible 
with the designated host. For example, expression control sequences for E. coli, and more 
generally for prokaryotes, include promoters and ribosome binding sites. Promoter 
sequences may be naturally occurring, such as the O-lactamase (penicillinase) (Weissman 
1981, In Interferon 3 (ed. L. Gresser), lactose (lac) (Chang et al. Nature, 1977, 198: 1056) 
and tryptophan (trp) (Goeddel et al. Nucl. Acids Res. 1980, 8, 4057) and lambda-derived 
P L promoter system. In addition, synthetic promoters which do not occur in nature also 
function as bacterial promoters. This is the case for example for the tac synthetic hybrid 
promoter which is derived from sequences of the trp and lac promoters (De Boer et al.. 
Proa Natl Acad Sci. USA 1983, 80, 21-26). These systems are particularly suitable with E. 
coli. 
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Yeast compatible vectors also carry markers that allow the selection of successful 
transformants by conferring prototrophy to auxotrophic mutants or resistance to heavy 
metals on wild-type strains. Expression control sequences for yeast vectors include 
promoters for glycolytic enzymes (Hess et al. v J. Adv. Enzyme Reg. 1968, 7, 149), PH05 
gene encoding acid phosphatase, CUP1 gene, ARG3 gene, GAL genes promoters and 
synthetic promoter sequences.. Other control elements useful in yeast expression are 
terminators and mRNA leader sequences. The 5' coding sequence is particularly useful 
since it typically encodes a signal peptide comprised of hydrophobic amino acids which 
direct the secretion of the protein from the cell. Suitable signal sequences can be encoded 
by genes for secreted yeast proteins such as the yeast invertase gene and the a-factor 
gene, acid phosphatase, killer toxin, the alpha-mating factor gene and recently the 
heterologous inulinase signal sequence derived from INU1A gene of Kluyveromyces 
marxianus.. Suitable vectors have been developed for expression in Pichia pastoris and 
Saccharomyces cerevisiae. 

A variety of P. pastoris expression vectors are available based on various inducible 
or constitutive promoters ( Cereghino and Cregg, FEMS Microbiol. Rev. 2000,24:45-66). 
For the production of cytosolic and secreted proteins.the most commonly used P. pastoris 
vectors contain the very strong and tightly regulated alcohol oxidase (AOX1) promoter. 
The vectors also contain the P. pastoris histidino! dehydrogenase (HIS4) gene for selection 
in his4 hosts. Secretion of foreign protein require the presence of a signal sequence and 
the S. cerevisiae prepro alpha mating factor signal sequence has been widly and 
successfully used in Pichia expression system. Expression vectors are integrated into the 
P. pastoris genome to maximize the stability of expression strains. As in S.cerevisiae, 
cleavage of a P.pastoris expression vector within a sequence shared by the host genome 
(AOX1 or HIS4) stimulates homologous recombination events that efficiently target 
integration of the vector to that genomic locus. In general, a recombinant strain that 
contains multiple integrated copies of an expression cassette can yield more heterologous 
protein than single-copy strain. The most effective way to obtain high copy number 
transformants requires the transformation of Pichia recipient strain by the sphaeroplast 
technique (Cregg et all 1985, Mol.Cell.Biol. 5: 3376-3385) . 

The preparation of the replicable expression vector may be carried out 
conventionally with appropriate enzymes for restriction, polymerisation and ligation of the 
DNA, by procedures described in, for example, Maniatis et al cited above. 

The recombinant host cell is prepared, in accordance with the invention, by 
transforming a host cell with a replicable expression vector of the invention under 
transforming conditions. Suitable transforming conditions are conventional and are 
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described in, for example, Maniatis et al. cited above, or "DNA Cloning" Vol. II, D.M. Glover 
ed., IRL Press Ltd, 1985. 

The choice of transforming conditions depends upon the choice of the host cell to 
be transformed. For example, in vivo transformation using a live viral vector as the 
transforming agent for the polynucleotides of the invention is described above. Bacterial 
transformation of a host such as E. coli may be done by direct uptake of the 
polynucleotides (which may be expression vectors containing the desired sequence) after 
the host has been treated with a solution of CaCI 2 (Cohen et al., Proc. Nat Acad. Sci., 
1973, 69, 2110) or with a solution comprising a mixture of rubidium chloride (RbC1), 
MnCI 2 , potassium acetate and glycerol, and then with 3-[N-morpholino]-propane-sulphonic 
acid, RbC1 and glycerol or by electroporation. Transformation of lower eukaryotic 
organisms such as yeast cells in culture by direct uptake may be carried out for example 
by using the method of Hinnen et al (Proc. Natl. Acad. Sci. 1978, 75 : 1929-1933). 
Mammalian cells in culture may be transformed using the calcium phosphate co- 
precipitation of the vector DNA onto the cells (Graham & Van der Eb, Virology 1978. 52, 
546). Other methods for introduction of polynucleotides into mammalian cells include 
dextran mediated transfection, polybrene mediated transfection, protoplast fusion, 
electroporation. encapsulation of the polynucleotide(s) into- liposomes, and direct micro- 
injection of the polynucleotides into nuclei. 

The invention also extends to a host cell transformed with a nucleic acid encoding 
the protein of the invention or a replicable expression vector of the invention. 

Culturing the transformed host cell under conditions permitting expression of the 
DNA polymer is carried out conventionally, as described in, for example, Maniatis et al. 
and "DNA Cloning" cited above. Thus, preferably the cell is supplied with nutrient and 
cultured at a temperature below 50°C, preferably between 25°C and 42°C, more preferably 
between 25°C and 35°C. most preferably at 30°C. The incubation time may vary from a 
few minutes to a few hours, according to the proportion of the polypeptide in the bacterial 
cell, as assessed by SDS-PAGE or Western blot. 

The product may be recovered by conventional methods according to the host cell 
and according to the localisation of the expression product (intracellular or secreted into 
the culture medium or into the cell periplasm). Thus, where the host cell is bacterial, such 
as E. coll it may, for example, be lysed physically, chemically or enzymatically and the 
protein product isolated from the resulting lysate. Where the host cell is mammalian, the 
product may generally be isolated from the nutrient medium or from cell free extracts. 
Where the host cell is a yeast such as Saccharomyces cerevisiae or Pichia pastoris, the 
product may generally be isolated from from lysed cells or from the culture medium, and 
then further purified using conventional techniques. The specificity of the expression 
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system may be assessed by western blot or by ELISA using an antibody directed against 
the polypeptide of interest. 



Conventional protein isolation techniques include selective precipitation, adsorption 
chromatography, and affinity chromatography including a monoclonal antibody affinity 
column. When the proteins of the present invention are expressed with a histidine tail (His 
tag), they can easily be purified by affinity chromatography using an ion metal affinity 
chromatography column (IMAC) eolumn.The metal ion, may be any suitable ion for 
example zinc, nickel, iron, magnesium or copper, but is preferably zinc or nickel. 
Preferably the IMAC buffer contains detergent, preferably an anionic detergent such as 
SDS, more preferably a non-ionic detergent such as Tween 80, or a zwitterionic detergent 
such as Empigen BB, as this may result in lower levels of endotoxin in the final product. 

- Further chromatographic steps include for example a Q-Sepharose step that may 

be operated either before of after the IMAC column. Preferably the pH is in the range of 
7.5 to 10, more preferably from 7.5 to 9.5, optimally between 8 and 9. 

The proteins of the invention can thus be purified according to the following 
protocol. After cell disruption, cell extracts containing the protein can be solubilised in a 
pH 8.5 Tris buffer containing urea (8.0 M for example), and SDS (from 0.5% to 1% for 
example). After centrifugation, the resulting supernatant may then be loaded onto on to an 
IMAC (Nickel) Sepharose FF column equilibrated with a pH 8.5 Tris buffer. The column 
may then be washed with a high salt containing buffer (eg 0.75 - 1.5m NaC1 f 15 mM pH 
8.5 Tris buffer). The column may optionally then be washed again with phosphate buffer 
without salt. The proteins of the invention may be eluated from the column with an 
imidazole-containing buffered solution. The proteins can then be submitted to an 
additional chromatographic step, such as to an anion exchange chromatography (Q 
Sepharose for example). . 

The proteins of the present invention are provided either soluble in a liquid form or 
in a lyophilised form, which is the preferred form. It is generally expected that each human 
dose will comprise 1 to 1000 yg of protein, and preferably 30-300 pg. The purification 
process can also include a carboxyamidation step whereby the protein is first reduced in 
the presence of Glutathion and then carboxymethylated in the presence of iodoacetamide. 
This step offers the advantage of controling the oxidative aggregation of the molecule with 
itself or with host cell protein contaminants through covalent bridging with disulphide 
bonds. 

The present invention also provides pharmaceutical and immunogenic 
compositions comprising a protein of the present invention in a pharmaceutical^ 
acceptable excipient. 
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A preferred vaccine composition comprises at least a protein according to the invention. 
Said protein has, preferably, blocked thiol groups and is highly purified, e.g. has less than 
5% host cell contamination. Such vaccine may optionally contain one or more other 
tumour-associated antigen and derivatives. For example, suitable other associated antigen 
include prostase, PAP-1, PSA (prostate specific antigen), PSMA (prostate-specific 
membrane antigen), PSCA (Prostate Stem Cell Antigen), STEAP. 

In another embodiment, illustrative immunogenic compositions, such as for 
example vaccine compositions, of the present invention comprise DNA encoding one or 
more of the fusion polypeptides as described above, such that the fusion polypeptide is 
generated in situ. As noted above, the polynucleotide may be administered within any of a 
variety of delivery systems known to those of ordinary skill in the art. Indeed, numerous 
gene delivery techniques are well known in the art, such as those described by Rolland, 
Crit. Rev. Therap. Drug Carrier Systems 75:143-198, 1998, and references cited therein. 
Appropriate polynucleotide expression systems will, of course, contain the necessary 
regulatory DNA regulatory sequences for expression in a patient (such as a suitable 
promoter and terminating signal). Alternatively, bacterial delivery systems may involve the 
administration of a bacterium (such as Bacillus-Caimette-Guerrin) that expresses an 
immunogenic portion of the polypeptide on its cell surface or secretes such an epitope. 

Therefore, in certain embodiments, polynucleotides encoding immunogenic 
polypeptides described herein are introduced into suitable mammalian host cells for 
expression using any of a number of known viral-based systems. In one illustrative 
embodiment, retroviruses provide a convenient and effective platform for gene delivery 
systems. A selected nucleotide sequence encoding a polypeptide of the present invention 
can be inserted into a vector and packaged in retroviral particles using techniques known 
in the art. The recombinant virus can then be isolated and delivered to a subject. A number 
of illustrative retroviral systems have been described (e.g., U.S. Pat. No. 5,219,740; Miller 
and Rosman (1989) BioTechniques 7:980-990; Miller, A. D. (1990) Human Gene Therapy 
1:5-14; Scarpa et al. (1991) Virology 180:849-852; Bums et al. (1993) Proc. Natl. Acad. 
Sci. USA 90:8033-8037; and Boris-Lawrie and Temin (1993) Cur. Opin. Genet. Develop. 
3:102-109. 

In addition, a number of illustrative adenovirus-based systems have also been 
described. Unlike retroviruses which integrate into the host genome, adenoviruses persist 
extrachromosomally thus minimizing the risks associated with insertional mutagenesis 
(Haj-Ahmad and Graham (1986) J. Virol. 57:267-274; Bett et al. (1993) J. Virol. 67:5911- 
5921; Mittereder et al. (1994) Human Gene Therapy 5:717-729; Seth et al. (1994) J. Virol. 
68:933-940; Barret al. (1994) Gene Therapy 1:51-58; Berkner, K. L. (1988) BioTechniques 
6:616-629; and Rich et al. (1993) Human Gene Therapy 4:461-476). 
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^ Various adeno-associated virus (AAV) vector systems have also been developed 
for polynucleotide delivery. AAV vectors can be readily constructed using techniques well 
known in the art. See, e.g., U.S. Pat. Nos. 5,173,414 and 5,139,941; International 
Publication Nos. WO 92/01070 and WO 93/03769; Lebkowski et al. (1988) Molec. Cell. 
Biol. 8:3988-3996; Vincent et al. (1990) Vaccines 90 (Cold Spring Harbor Laboratory 
Press); Carter, B. J. (1992) Current Opinion in Biotechnology 3:533-539; Muzyczka, N. 
(1992) Current Topics in Microbiol, and Immunol. 158:97-129; Kotin, R. M. (1994) Human 
Gene Therapy 5:793-801; Shelling and Smith (1994) Gene Therapy 1:165-169; and Zhou 
et al. (1994) J. Exp. Med. 179:1867-1875. 

Additional viral vectors useful for delivering the nucleic acid molecules encoding 
polypeptides of the present invention by gene transfer include those derived from the pox 

-family of viruses, such as vaccinia virus and avian poxvirus. By way of example, vaccinia 
virus recombinants expressing the novel molecules can be constructed as follows. The 
DNA encoding a polypeptide is first inserted into an appropriate vector so that ft is adjacent 
to a vaccinia promoter and flanking vaccinia DNA sequences, such as the sequence 
encoding thymidine kinase (TK). This vector is then used to transfect cells which are 
simultaneously infected with vaccinia. Homologous recombination serves to insert the 
vaccinia promoter plus the gene encoding the polypeptide of interest into the viral genome. 
The resulting TK.sup.(-) recombinant can be selected by culturing the cells in the presence 
of 5-bromodeoxyuridine and picking viral plaques resistant thereto. 

A vaccinia-based infection/transfection system can be conveniently used to provide 
for inducible, transient expression or coexpression of one or more polypeptides described 
herein in host cells of an organism. In this particular system, cells are first infected in vitro 
with a vaccinia virus recombinant that encodes the bacteriophage T7 RNA polymerase. 
This polymerase displays exquisite specificity in that it only transcribes templates bearing 
T7 promoters. Following infection, cells are transfected with the polynucleotide or 
polynucleotides of interest, driven by a T7 promoter. The polymerase expressed in the 
cytoplasm from the vaccinia virus recombinant transcribes the transfected DNA into RNA 
which is then translated into polypeptide by the host translational machinery. The method 
provides for high level, transient, cytoplasmic production of large quantities of RNA and its 
translation products. See, e.g., Elroy-Stein and Moss, Proc. Natl. Acad. Sci. USA (1990) 
87:6743-6747; Fuerst et al. Proc. Natl. Acad. Sci. USA (1986) 83:8122-8126. 

Alternatively, avipoxviruses, such as the fowlpox and canarypox viruses, can also 
be used to deliver the coding sequences of interest. Recombinant avipox viruses, 
expressing immunogens from mammalian pathogens, are known to confer protective 
immunity when administered to non-avian species. The use of an Avipox vector is 
particularly desirable in human and other mammalian species since members of the 
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Avipox genus can only productively replicate in susceptible avian species and therefore 
are not infective in mammalian cells. Methods for producing recombinant Avipoxviruses 
are known in the art and employ genetic recombination, as described above with respect 
to the production of vaccinia viruses. See, e.g.. WO 91/12882; WO 89/03429; and WO 
92/03545. 

Any of a number of alphavirus vectors can also be used for delivery of 
polynucleotide compositions of the present invention, such as those vectors described in 
U.S. Patent Nos. 5,843,723; 6,015.686; 6,008,035 and 6,015,694. Certain vectors based 
on Venezuelan Equine Encephalitis (VEE) can also be used, illustrative examples of which 
can be found in U.S. Patent Nos. 5,505,947 and 5,643,576. 

In another embodiment of the invention, a polynucleotide is administered/delivered 
as "naked" DNA, for example as described in Ulmer et al., Science 259:1745-1749, 1993 
and reviewed by Cohen, Science 259:1691-1692. 1993. The uptake of naked DNA may 
be increased by coating the DNA onto biodegradable beads, which are efficiently 
transported into the cells. 

The fusion proteins and encoding polypeptides according to the invention can also 
be formulated as a phamaceutical composition, e.g. as a vaccine. 

It is possible for the vaccine composition to be administered on a once off basis or, 
preferably, to be administered repeatedly, as many times as necessary, for example, 
between 1 and 7 times, preferably between 1 and 4 times, at intervals between about 1 
day and about 18 months, preferably one month. This may be optionally followed by 
dosing at regular intervals of between 1 and 12 months for a period up to the remainder of 
the patient's life. In a preferred embodiment the patient receives the antigen in different 
forms in a "prime boost" regime. Thus for example the antigen, the fusion protein, is first 
administered as a DNA based vaccine and then subsequently administered as a protein 
adjuvant base formulation. This administration mode is preferred. The preferred adjuvant is 
a combination of a CpG-containing oligonucleotide and a saponin derivative, particularly 
the combination of CpG and QS21 as disclosed in WO 00/09159 and in WO 00/62800. 
The uptake of naked DNA may be increased by coating the DNA onto biodegradable 
beads, which are efficiently transported into the cells. Alternatively the DNA can be 
delivered via a particle bombardment approach, for example, gas-driven particle 
acceleration with devices such as those manufactured by Powderject Pharmaceuticals 
PLC (Oxford. UK) and Powderject Vaccines Inc. (Madison. Wl). some examples of which 
are described in U.S. Patent Nos. 5.846.796; 6.010.478; 5.865.796; 5.584.807; and EP 
Patent No. 0500 799. This approach offers a needle-free delivery approach wherein a dry 
powder formulation of microscopic particles, such as polynucleotide or polypeptide 
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particles, are accelerated to high speed within a helium gas jet generated by a hand held 
device, propelling the particles into a target tissue of interest 



In another preferred embodiment, the adjuvanted protein will be administered first, 
followed by the DNA based vaccine. Still another embodiment will concern the delivery of 
the DNA construct by means of specialised delivery vectors, preferably by the means of 
viral system, most preferably by the means of adenoviral-based systems. Other suitable 
viral-based systems of DNA delivery include retroviral, lentiviral, adeno-associated viral, 
herpes viral and vaccinia-viral based systems. 

The treatment regime will be significantly varied depending upon the size and 
species of patient concerned, the amount of nucleic acid vaccine and / or protein 
composition administered, the route of administration, the potency and dose of any 
adjuvant compounds used and other factors which would be apparent to a skilled medical 
practitioner. 

The fusion proteins of the present invention are provided preferably at least 80% 
pure more preferably 90% pure as visualised by SDS PAGE. Preferably the proteins 
appear as a single band by SDS PAGE. 

The present invention also provides pharmaceutical composition comprising a 
fusion protein of the present invention in a pharmaceutical^ acceptable excipient. 
Accordingly there is also provided a process for the preparation of a immunogenic 
composition according to the present invention, comprising admixing the fusion protein of 
the invention or the encoding polynucleotide with a suitable adjuvant, diluent or other 
pharmaceutical^ acceptable carrier 

Vaccine preparation is generally described in Vaccine Design ("The subunit and 
adjuvant approach" (eds. Powell M.F. & Newman M.J). (1995) Plenum Press New York). 
Encapsulation within liposomes is described by Fullerton, US Patent 4,235,877. 

The fusion proteins of the present invention and encoding polynucleotides are 
preferably adjuvanted in the vaccine formulation of the invention. Certain adjuvants are 
commercially available as, for example, Freund's Incomplete Adjuvant and Complete 
Adjuvant (Difco Laboratories, Detroit, Ml); Merck Adjuvant 65 (Merck and Company, Inc., 
Rahway, NJ); AS-2 (SmithKline Beecham, Philadelphia, PA); aluminum salts such as 
aluminum hydroxide gel (alum) or aluminum phosphate; salts of calcium, iron or zinc; an 
insoluble suspension of acylated tyrosine; acylated sugars; cationically or anionically 
derivatised polysaccharides; polyphosphazenes; biodegradable microspheres; 
monophosphoryi lipid A and quil A. Cytokines, such as GM-CSF, interleukin-2, -7, -12, 
and other like growth factors, may also be used as adjuvants. 
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Within certain embodiments of the invention, the adjuvant composition is preferably 
one that induces an immune response predominantly of the Th1 type. High levels of Th1- 
type cytokines (e.g., IFN-y, TNFa, IL-2 and IL-12) tend to favor the induction of cell 
mediated immune responses to an administered antigen. In contrast, high levels of Th2- 
type cytokines (e.g., IL-4, IL-5, IL-6 and IL-10) tend to favor the induction of humoral 
immune responses. Following application of a vaccine as provided herein, a patient will 
support an immune response that includes Th1- and Th2-type responses. Within a 
preferred embodiment, in which a response is predominantly Th1-type, the level of Th1- 
type cytokines will increase to a greater extent than the level of Th2-type cytokines. The 
levels of these cytokines may be readily assessed using standard assays. For a review of 
the families of cytokines, see Mosmann and Coffman, Ann. Rev. Immunol. 7:145-173, 
T989. 

Preferred TH-1 inducing adjuvants are selected from the group of adjuvants 
comprising: 3D-MPL, QS21, a mixture of QS21 and cholesterol, and a CpG oligonucleotide 
or a mixture of two or more said adjuvants. Certain preferred adjuvants for eliciting a 
predominantly Th1-type response include, for example, a combination of monophosphoryl 
lipid A, preferably 3-de-O-acylated monophosphoryl lipid A, together with an aluminum 
salt. MPL® adjuvants are available from Corixa Corporation (Seattle, WA; see, for 
example, US Patent Nos. 4,436,727; 4,877,611; 4,866,034 and 4,912,094). CpG- 
containing oligonucleotides (in which the CpG dinucleotide is unmethylated) also induce a 
predominantly Th1 response. Such oligonucleotides are well known and are described, for 
example, in WO 96/02555, WO 99/33488 and U.S. Patent Nos. 6,008,200 and 5,856,462. 
Immunostimulatory DNA sequences are also described, for example, by Sato et ah, 
Science 273:352, 1996. Another preferred adjuvant comprises a saponin, such as Quil A, 
or derivatives thereof, including QS21 and QS7 (Aquila Biopharmaceuticals Inc., 
Framingham, MA); Escin; Digitonin; or Gypsophila or Chenopodium quinoa saponins . 
Other preferred formulations include more than one saponin in the adjuvant combinations 
of the present invention, for example combinations of at least two of the following group 
comprising QS21, QS7, Quil A, 0-escin, or digitonin. 

Alternatively the saponin formulations may be combined with vaccine vehicles 
composed of chitosan or other polycationic polymers, polylactide and polylactide-co- 
giycolide particles, poly-N-acetyl glucosamine-based polymer matrix, particles composed 
of polysaccharides or chemically modified polysaccharides, liposomes and lipid-based 
particles, particles composed of glycerol monoesters, etc. The saponins may also be 
formulated in the presence of cholesterol to form particulate structures such as liposomes 
or ISCOMs. Furthermore, the saponins may be formulated together with a polyoxyethylene 
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ether or ester, in either a non-particulate solution or suspension, or in a particulate 
structure such as a paucilamelar liposome or ISCOM. The saponins may also be 
formulated with excipients such as Carbopol R to increase viscosity, or may be formulated 
in a dry powder form with a powder excipient such as lactose. 

In one preferred embodiment, the adjuvant system includes the combination of a 
monophosphoryl lipid A and a saponin derivative, such as the combination of QS21 and 
3D-MPL® adjuvant, as described in WO 94/00153, or a less reactogenic composition 
where the QS21 is quenched with cholesterol, as described in WO 96/33739. Other 
preferred formulations comprise an oil-in-water emulsion and tocopherol. Another 
particularly preferred adjuvant formulation employing QS21, 3D-MPL® adjuvant and 
tocopherol in an oil-in-water emulsion is described in WO 95/17210. 

Another enhanced adjuvant system involves the combination of a CpG-containing 
oligonucleotide and a saponin derivative particularly the combination of CpG and QS21 as 
disclosed in WO 00/09159 and in WO 00/62800. Preferably the formulation additionally 
comprises an oil in water emulsion and tocopherol. 

In a yet further embodiment the present invention provides an immunogenic 
composition comprising a fusion protein according to the invention, and further comprising 
D3-MPL, a saponin preferably QS21 and a CpG oligonucleotide, optionally formulated in 
an oil in water emulsion. 

Additional illustrative adjuvants for use in the pharmaceutical compositions of the 
invention include Montanide ISA 720 (Seppic, France), SAF (Chiron, California, United 
States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of adjuvants (e.g., SBAS-2 or 
SBAS-4, available from SmithKline Beecham, Rixensart, Belgium), Detox (Enhanzyn®) 
(Corixa, Hamilton, MT), RC-529 (Corixa, Hamilton, MT) and other aminoalkyl 
glucosaminide 4-phosphates (AGPs), such as those described in pending U.S. Patent 
Application Serial Nos. 08/853,826 and 09/074,720, the disclosures of which are 
incorporated herein by reference in their entireties, and polyoxyethylene ether adjuvants 
such as those described in WO 99/52549A1. 

Other preferred adjuvants include adjuvant molecules of the general formula (I): 
HO(CH 2 CH 2 0) n -A-R, wherein, n is 1-50, A is a bond or -C(O)-, R is C^ 0 alky) or Phenyl 
Ci_5o alkyl. 

One embodiment of the present invention consists of a vaccine formulation 
comprising a polyoxyethylene ether of general formula (I), wherein n is between 1 and 50, 
preferably 4-24, most preferably 9; the R component is C^o, preferably C 4 -C 20 alkyl and 
most preferably C 12 alkyl, and A is a bond. The concentration of the polyoxyethylene 
ethers should be in the range 0.1-20%, preferably from 0.1-10%, and most preferably in 



26 



VB60013 
) 

the range 0.1-1%. Preferred polyoxyethylene ethers are selected from the following group: 
po!yoxyethylene-9-lauryl ether, polyoxyethylene-9-steoryl ether, polyoxyethylene-8-steoryl 
ether, po)yoxyethylene-4-lauryi ether, polyoxyethylene-35-lauryl ether, and 
polyoxyethylene-23-lauryl ether. Polyoxyethylene ethers such as polyoxyethylene lauryl 
ether are described in the Merck index (12 th edition: entry 7717). These adjuvant 
molecules are described in WO 99/52549. 

The polyoxyethylene ether according to the general formula (I) above may, if 
desired, be combined with another adjuvant. For example, a preferred adjuvant 
combination is preferably with CpG as described in the pending UK patent application GB 
9820956.2. 

It is an embodiment of the invention that the antigens, including nucleic acid vector, 
of the invention be utilised with immunostimulatory agent. Preferably the 
immunostimulatory agent is administered at the same time as the antigens of the invention 
and in preferred embodiments are formulated together. Such immunostimulatory agents 
include but are not limited to: synthetic imidazoquinolines such as imiquimod [S-26308, R- 
837], (Harrison, et aL, Vaccine 19: 1820-1826, 2001; and resiquimod [S-28463, R-848] 
(Vasilakos, et aL, Cellular immunology 204: 64-74, 2000.; Schiff bases of carbonyls and 
amines that are constitutively expressed on antigen presenting cell and T-cell surfaces, 
such as tucaresol (Rhodes, J. et al„ Nature 377: 71-75, 1995), cytokine, chemokine and 
co-stimulatory molecules as either protein or peptide, including for example pro- 
inflammatory cytokines such as Interferon, GM-CSF, IL-1 alpha, IL-1 beta, TGF- alpha and 
TGF - beta, Th1 inducers such as interferon gamma, IL-2, IL-1 2, IL-1 5, IL-1 8 and IL-21, 
Th2 inducers such as lL-4, IL-5, IL-6, IL-10 and IL-1 3 and other chemokine and co- 
stimulatory genes such as MCP-1, MIP-1 alpha, MIP-1 beta, RANTES, TCA-3, CD80, 
CD86 and CD40L, other immunostimulatory targeting ligands such as CTLA-4 and L- 
selectin, apoptosis stimulating proteins and peptides such as Fas, (49), synthetic lipid 
based adjuvants, such as vaxfectin, (Reyes et al. a Vaccine 19: 3778-3786, 2001) 
squalene, alpha- tocopherol, polysorbate 80, DOPC and cholesterol, endotoxin, [LPS], 
(Beutler, B., Current Opinion in Microbiology 3: 23-30, 2000); CpG oligo- and di- 
nucleotides (Sato, Y. et aL, Science 273 (5273): 352-354, 1996; Hemmi, H. et aL, Nature 
408: 740-745, 2000) and other potential ligands that trigger Toll receptors to produce Th1- 
inducing cytokines, such as synthetic Mycobacterial lipoproteins, Mycobacterial protein 
p19, peptidoglycan, teichoic acid and lipid A. 

Other suitable adjuvant include CT (cholera toxin, subunites A and B) and LT (heat 
labile enterotoxin from E. coli, subunites A and B), heat shock protein family (HSPs), and 
LLO (listerioiysin O; WO 01/72329). 
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Within further aspects, the present invention provides methods for stimulating an 
immune response in a patient, preferably a T cell response in a human patient, comprising 
administering a pharmaceutical composition described herein. The patient may be 
afflicted with lung or colon cancer or colorectal cancer or breast cancer, in which case the 
methods provide treatment for the disease, or patient considered at risk for such a disease 
may be treated prophylactically. 

Within further aspects, the present invention provides methods for inhibiting the 
development of a cancer in a patient, comprising administering to a patient a 
pharmaceutical composition as recited above. The patient may be afflicted with, for 
example, sarcoma, prostate, ovarian, bladder, lung, colon, colorectal or breast cancer, in 
-which' case the methods provide treatment for the disease, or patient considered at risk for 
such a disease may be treated prophylactically. 

The present invention further provides, within other aspects, methods for removing 
tumor cells from a biological sample, comprising contacting a biological sample with T cells 
that specifically react with a polypeptide of the present invention, wherein the step of 
contacting is performed under conditions and for a time sufficient to permit the removal of 
cells expressing the protein from the sample. 

Within related aspects, methods are provided for inhibiting the development of a 
cancer in a patient, comprising administering to a patient a biological sample treated as 
described above. 

Methods are further provided, within other aspects, for stimulating and/or 
expanding T cells specific for a polypeptide of the present invention, comprising contacting 
T cells with one or more of: (i) a polypeptide as described above; (ii) a polynucleotide 
encoding such a polypeptide; and/or (iii) an antigen presenting cell that expresses such a 
polypeptide; under conditions and for a time sufficient to permit the stimulation and/or 
expansion of T cells. Isolated T cell populations comprising T cells prepared as described 
above are also provided. 

Within further aspects, the present invention provides methods for inhibiting the 
development of a cancer in a patient, comprising administering to a patient an effective 
amount of a T cell population as described above. 

The present invention further provides methods for inhibiting the development of a 
cancer in a patient, comprising the steps of: (a) incubating CD4+ and/or CD8+ T cells 
isolated from a patient with one or more of: (i) a polypeptide disclosed herein; (ii) a 
polynucleotide encoding such a polypeptide; and (iii) an antigen-presenting cell that 
expressed such a polypeptide; and (b) administering to the patient an effective amount of 
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the proliferated T cells, and thereby inhibiting the development of a cancer in the patient. 
Proliferated cells may, but need not, be cloned prior to administration to the patient 

According to another embodiment of this invention, an immunogenic composition 
described herein is delivered to a host via antigen presenting cells (APCs), such as 
dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered 
to be efficient APCs. Such cells may, but need not, be genetically modified to increase the 
capacity for presenting the antigen, to improve activation and/or maintenance of the T cell 
response, to have anti-tumor effects per se and/or to be immunologically compatible with 
the receiver (/.e. f matched HLA haplotype). APCs may generally be isolated from any of a 
variety of biological fluids and organs, including tumor and peritumoral tissues, and may be 
autologous, allogeneic, syngeneic or xenogeneic cells. 

Certain preferred embodiments of the present invention use dendritic cells or 
progenitors thereof as antigen-presenting cells. Dendritic cells are highly potent APCs 
(Banchereau and Steinman, Nature 392:245-251, 1998) and have been shown to be 
effective as a physiological adjuvant for eliciting prophylactic or therapeutic antitumor 
immunity (see Timmerman and Levy, Ann. Rev. Med. 50:507-529, 1999). In general, 
dendritic cells may be identified based on their typical shape (stellate in situ, with marked 
cytoplasmic processes (dendrites) visible in vitro), their ability to take up, process and 
present antigens with high efficiency and their ability to activate naive T cell responses. 
Dendritic cells may, of course, be engineered to express specific cell-surface receptors or 
ligands that are not commonly found on dendritic cells in vivo or ex vivo, and such modified 
dendritic cells are contemplated by the present invention. As an alternative to dendritic 
cells, secreted vesicles antigen-loaded dendritic cells (called exosomes) may be used 
within a vaccine (see Zitvogel et al. t Nature Med. 4:594-600, 1998). 

Dendritic cells and progenitors may be obtained from peripheral blood, bone 
marrow, tumor-infiltrating cells, peritumoral tissues-infiltrating cells, lymph nodes, spleen, 
skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells 
may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, IL-4, 
IL-13 and/or TNFa to cultures of monocytes harvested from peripheral blood. 
Alternatively, CD34 positive cells harvested from peripheral blood, umbilical cord blood or 
bone marrow may be differentiated into dendritic cells by adding to the culture medium 
combinations of GM-CSF, IL-3, TNFa, CD40 ligand, LPS, flt3 ligand and/or other 
compound(s) that induce differentiation, maturation and proliferation of dendritic cells. 

Dendritic cells are conveniently categorized as "immature" and "mature" cells, 
which allows a simple way to discriminate between two well characterized phenotypes. 
However, this nomenclature should not be construed to exclude all possible intermediate 
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stages of differentiation. Immature dendritic cells are characterized as APC with a high 
capacity for antigen uptake and processing, which correlates with the high expression of 
Fey receptor and mannose receptor. The mature phenotype is typically characterized by a 
lower expression of these markers, but a high expression of cell surface molecules 
responsible for T cell activation such as class I and class II MHC, adhesion molecules 
(e.g., CD54 and CD11) and costimulatory molecules (e.g., CD40, CD80, CD86 and 4- 
1BB). 

APCs may generally be transfected with a polynucleotide of the invention (or 
portion or other variant thereof) such that the encoded polypeptide, or an immunogenic 
portion thereof, is expressed on the cell surface. Such transfection may take place ex 
vivo, and a pharmaceutical composition comprising such transfected cells may then be 
used for. therapeutic purposes, as described herein. Alternatively, a gene delivery vehicle 
that targets a dendritic or other antigen presenting cell may be administered to a patient, 
resulting in transfection that occurs in vivo. In vivo and ex vivo transfection of dendritic 
cells, for example, may generally be performed using any methods known in the art, such 
as those described in WO 97/24447, or the gene gun approach described by Mahvi et al. f 
Immunology and cell Biology 75:456-460, 1997. Antigen loading of dendritic cells may be 
achieved by incubating dendritic cells or progenitor cells with the tumor polypeptide, DNA 
(naked or within a plasmid vector) or RNA; or with antigen-expressing recombinant 
bacterium or viruses (e.g., vaccinia, fowlpox, adenovirus or lentivirus vectors). Prior to 
loading, the polypeptide may be covalently conjugated to an immunological partner that 
provides T cell help (e.g., a carrier molecule). Alternatively, a dendritic cell may be pulsed 
with a non-conjugated immunological partner, separately or in the presence of the 
polypeptide. 

Definitions 

Also provided by the invention are methods for the analysis of character sequences 
or strings, particularly genetic sequences or encoded protein sequences. Preferred methods 
of sequence analysis include, for example, methods of sequence homology analysis, such 
as identity and similarity analysis, DNA, RNA and protein structure analysis, sequence 
assembly, cladistic analysis, sequence motif analysis, open reading frame determination, 
nucleic acid base calling, codon usage analysis, nucleic acid base trimming, and sequencing 
chromatogram peak analysis. 

A computer based method is provided for performing homology identification. This 
method comprises the steps of: providing a first polynucleotide sequence comprising the 
sequence of a polynucleotide of the invention in a computer readable medium; and 
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^comparing said first polynucleotide sequence to at least one second polynucleotide or 
polypeptide sequence to identify homology. 

A computer based method is also provided for performing homology identification, 
said method comprising the steps of: providing a first polypeptide sequence comprising the 
sequence of a polypeptide of the invention in a computer readable medium; and comparing 
said first polypeptide sequence to at least one second polynucleotide or polypeptide 
sequence to identify homology. 

All publications and references, including but not limited to patents and patent 
applications, cited in this specification are herein incorporated by reference in their entirety 
as if each individual publication or reference were specifically and individually indicated to be 
incorporated by reference herein as being fully set forth. Any patent application to which this 
~applicit5irdaims pribrityTslilsb incorporated by reference" herein In its entirety in the 
manner described above for publications and references. 

"Identity," as known in the art, is a relationship between two or more polypeptide sequences or 
two or more polynucleotide sequences, as the case may be, as determined by comparing the 
sequences. In the art, "identity" also means the degree of sequence relatedness between 
polypeptide or polynucleotide sequences, as the case may be, as determined by the match 
between strings of such' sequences. "Identity" can be readily calculated by known methods, 
including but not limited to those described in (Computational Molecular Biology, Lesk, A.M., 
ed. f Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome 
Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of 
Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 
1994; Sequence Analysis in Molecular Biology, von Heine, G., Academic Press, 1987; and 
Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New 
York, 1991; and Carillo, H., and Lipman, D. t SIAM J. Applied Math., 48: 1073 (1988). 
Methods to determine identity are designed to give the largest match between the 
sequences tested. Moreover, methods to determine identity are codified in publicly available 
computer programs. Computer program methods to determine identity between two 
sequences include, but are not limited to, the GAP program in the GCG program package 
(Devereux, J., et aL, Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN 
(Altschul, S.F. et aL, J. A/fo/ec. Biol. 215: 403-410 (1990), and FASTA( Pearson and Lipman 
Proc. Natl. Acad. Sci. USA 85; 2444-2448 (1988). The BLAST family of programs is publicly 
available from NCBI and other sources (BLAST Manual, Altschul, S., et aL, NCBI NLM NIH 
Bethesda, MD 20894; Altschul, S., et aL, J. MoL BioL 215: 403-410 (1990). The well known 
Smith Waterman algorithm may also be used to determine identity. 

Parameters for polypeptide sequence comparison include the following: 
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Algorithm: Needleman and Wunsch, J. Mol Biol. 48: 443-453 (1970) 
Comparison matrix: BLOSSUM62 from Henikoff and Henikoff, 
Proc. Natl. Acad. Sci. USA. 89:10915-10919 (1992) 
Gap Penalty: 8 • 
Gap Length Penalty: 2 

A program useful with these parameters is publicly available as the "gap" program from 
Genetics Computer Group, Madison Wl. The aforementioned parameters are the default 
parameters for peptide comparisons (along with no penalty for end gaps). 

Parameters for polynucleotide comparison include the following: 
Algorithm: Needleman and Wunsch, J. Mol Biol. 48: 443-453 (1970) 
Comparison matrix: matches = +10, mismatch = 0 
Gap Penalty: 50 
Gap Length Penalty: 3 

Available as: The "gap" program from Genetics Computer Group, Madison Wl. These are 
the default parameters for nucleic acid comparisons. 

A preferred meaning for "identity" for polynucleotides and polypeptides, as the case 
may be, are provided in (1) and (2) below. 

(1) Polynucleotide embodiments further include an isolated polynucleotide comprising a 
polynucleotide sequence having at least a 50, 60, 70, 80, 85, 90, 95, 97 or 100% identity to 
any of the reference sequences of SEQ ID NO:9 to SEQ ID NO:16, wherein said 
polynucleotide sequence may be identical to any the reference sequences of SEQ ID NO:9 
to SEQ ID NO:16 or may include up to a certain integer number of nucleotide alterations as 
compared to the reference sequence, wherein said alterations are selected from the group 
consisting of at least one nucleotide deletion, substitution, including transition and 
transversion, or insertion, and wherein said alterations may occur at the 5' or 3* terminal 
positions of the reference nucleotide sequence or anywhere between those terminal 
positions, interspersed either individually among the nucleotides in the reference sequence 
or in one or more contiguous groups within the reference sequence, and wherein said 
number of nucleotide alterations is determined by multiplying the total number of nucleotides 
in any of SEQ ID NO:9 to SEQ ID NO:16 by the integer defining the percent identity divided 
by 100 and then subtracting that product from said total number of nucleotides in any of 
SEQ ID NO:9 to SEQ ID NO:16, on 

n n <£x n -(x n *y)> 
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wherein n n is the number of nucleotide alterations, x n is the total number of nucleotides in 
any of SEQ ID NO:9 to SEQ ID NO:16, y is 0.50 for 50%, 0.60 for 60%, 0.70 for 70%, 0.80 
for 80%, 0.85 for 85%, 0.90 for 90%, 0.95 for 95%, 0.97 for 97% or 1 .00 for 100%, and • is 
the symbol for the multiplication operator, and wherein any non-integer product of x n and y 
is rounded down to the nearest integer prior to subtracting it from x n . Alterations of 
polynucleotide sequences encoding the polypeptides of any of SEQ ID NO:1 to SEQ ID 
NO:8 may create nonsense, missense or frameshift mutations in this coding sequence and 
thereby alter the polypeptide encoded by the polynucleotide following such alterations. 

By way of example, a polynucleotide sequence of the present invention may be 
identical to-any of the reference sequences of SEQ ID NO:9. to SEQ ID NO:16, that is it may 
be 100% identical, or it may include up to a certain integer number of nucleic acid alterations 
as compared to the reference sequence such that the percent identity is less than 100% 
identity. Such alterations are selected from the group consisting of at least one nucleic acid 
deletion, substitution, including transition and transversion, or insertion, and wherein said 
alterations may occur at the 5' or 3' terminal positions of the reference polynucleotide 
sequence or anywhere between those terminal positions, interspersed either individually 
among the nucleic acids in the reference sequence or in one or more contiguous groups 
within the reference sequence. The number of nucleic acid alterations for a given percent 
identity is determined by multiplying the total number of nucleic acids in any of SEQ ID NO:9 
to SEQ ID NO: 16 by the integer defining the percent identity divided by 100 and then 
subtracting that product from said total number of nucleic acids in any of SEQ ID NO:9 to 
SEQ ID NO:16,or 

"n* x n-( x n*y). 

wherein n n is the number of nucleic acid alterations, x n is the total number of nucleic acids 
in any of SEQ ID NO:9 to SEQ ID NO:16, y is, for instance 0.70 for 70%, 0.80 for 80%, 0.85 
for 85% etc., • is the symbol for the multiplication operator, and wherein any non-integer 
product of x n and y is rounded down to the nearest integer prior to subtracting it from x n . 

(2) Polypeptide embodiments further include an isolated polypeptide comprising a 
polypeptide having at least a 50,60, 70, 80, 85, 90, 95, 97 or 100% identity to the 
polypeptide reference sequence of any of SEQ ID NO:1 to SEQ ID NO:8 f wherein said 
polypeptide sequence may be identical to any of the reference sequence of SEQ ID NO:1 to 
SEQ ID NO:8 or may include up to a certain integer number of amino acid alterations as 
compared to the reference sequence, wherein said alterations are selected from the group 

33 




consisting of at least one amino acid deletion, substitution, including conservative and non- 
conservative substitution, or insertion, and wherein said alterations may occur at the amino- 
or carboxy-terminal positions of the reference polypeptide sequence or anywhere between 
those terminal positions, interspersed either individually among the amino acids in the 
reference sequence or in one or more contiguous groups within the reference sequence, 
and wherein said number of amino acid alterations is determined by multiplying the total 
number of amino acids in any of SEQ ID NO:1 to SEQ ID NO:8 by the integer defining the 
percent identity divided by 100 and then subtracting that product from said total number of 
amino acids in any of SEQ ID NO:1 to SEQ ID NO:8, or 

n a ^x a -(x a *y), 

wherein n a is the number of amino acid alterations, x a is the total number of amino acids in 
SEQ ID NO:2, y is 0.50 for 50%, 0.60 for 60%, 0.70 for 70%, 0.80 for 80%, 0.85 for 85%, 
0.90 for 90%, 0.95 for 95%, 0.97 for 97% or 1.00 for 100%, and • is the symbol for the 
multiplication operator, and wherein any non-integer product of x a and y is rounded down to 
the nearest integer prior to subtracting it from x a . 

By way of example, a polypeptide sequence of the present invention may be identical 
to the reference sequence of any of SEQ ID NO:1 to SEQ ID NO:8 ? that is it may be 100% 
identical, or it may include up to a certain integer number of amino acid alterations as 
compared to the reference sequence such that the percent identity is less than 100% 
identity. Such alterations are selected from the group consisting of at least one amino acid 
deletion, substitution, including conservative and non-conservative substitution, or insertion, 
and wherein said alterations may occur at the amino- or carboxy-terminal positions of the 
reference polypeptide sequence or anywhere between those terminal positions, interspersed 
either individually among the amino acids in the reference sequence or in one or more 
contiguous groups within the reference sequence. The number of amino acid alterations for 
a given % identity is determined by multiplying the total number of amino acids in any of 
SEQ ID NO:1 to SEQ ID NO:8 by the integer defining the percent identity divided by 100 and 
then subtracting that product from said total number of amino acids in any of SEQ ID NO:1 
to SEQ ID NO:8, on 

n a £x a -(x a «y), 

wherein n a is the number of amino acid alterations, x a is the total number of amino acids in 
any of SEQ ID NO:1 to SEQ ID NO:8, y is, for instance 0.70 for 70%, 0.80 for 80%, 0.85 for 
85% etc., and • is the symbol for the multiplication operator, and wherein any non-integer 
product of x a and y is rounded down to the nearest integer prior to subtracting it from x a . 
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The invention will be further described by reference to the following examples: 

EXAMPLE I: Preparation of the recombinant Yeast strain Y1796 e xpressing P501 
Fusion Protein containing a C-LvtA-P2-C-LvtA (CP C1 as fusion partner 

1. - Protein design 

The structure of the fusion protein C-P2-C-p501 (alternatively named CPC-P501) to 
be expressed in S. cerevisiae is depicted in figure 3. This fusion contains the C-terminal 
region of gene LytA (residues 187 to 306), in which the P2 fragment of tetanus toxin 
(residues 830-843) has been inserted. The P2 fragment is placed between the residues 
278 and 279 of C-Lyt-A. The C-lytA fragment containing the P2 insertion is followed by 
P501 (residues amino acid 51 to 553) and by the His tail. 

The primary structure of the resulting fusion protein has the sequence described in 
figure 4 and the coding sequence corresponding to the above protein design is in figure 5. 

2. - Cloning strategy for the generation of a yeast plasmid expressing CPC-P501 
(51 -553)-His fusion protein 

• The starting material is the yeast vector pRIT15068 (UK patent application 0015619.0). 

• This vector contains the yeast Cup1 promoter, the yeast alpha prepro signal coding 
sequence and the coding sequence corresponding to residues 55 to 553 of P501S 
followed by His tail. 

• The cloning strategy outlined in figure 6 include the following steps: 

a) The first step is the insertion of P2 sequence in frame, inside the C-lytA coding 
sequence. The C-lytA coding sequence is harbored by plasmid pRIT 14662 
(PCT/EP99/00660). The insertion is done using an adaptor formed by two 
complementary oligonucleotides named P21 and P22 into the plasmid pRIT 14662 
previously open by Ncol 

The sequence of P21 and P22 is: 
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P21 5' catgcaatacatcaaggctaactctaagttcattggtatcactgaaggcgt 3' 



P22 3' gttatgtagttccgattgagattcaagtaaccatagtgacttccgcagtac 5' 
After ligation and transformation of E. coli and transformant characterization, the 
plasmid named pR!T15199 is obtained. 

b) The second step is the preparation of C-lytA-P2-C-lytA DNA fragment by PCR 
amplification. The amplification is performed using pRIT15199 as template and the 
oligonucleotides named C-LytANOTATG and C-LytA-aa55. The sequence of both 
oligonucleotides being: 

C-LytANOTATG 

=5 , aaaaccatggcggccgcttacgtacattccgacggctcttatccaaaagacaag 3* 
C-LytA-aa55 =5 , aaacatgtacatgaacttttctggcctgtctgccagtgttc 3' 

The amplified fragment is treated with the restriction enzymes Ncol and Afl III to 
generate the respective cohesive ends. 

c) The next step is the ligation of the above fragment with vector pRIT15068 (largest 
fragment obtained after Ncol treatment) to generate the complete fusion protein coding 
sequence. After ligation and E. coli transformation the plasmid named pRlT15200 is 
obtained. In this plasmid the remaining unique Ncol site contains the ATG coding for the 
start codon. 

d) In the next step a Ncol fragment containing the CUP1 promoter and a portion of 2\i 
plasmid sequences is prepared from plasmid PRIT 15202. Plasmid pRIT 15202 is a yeast 
2n derivative containing the CUP1 promoter with an Ncol site at ATG ( ATG sequence: 
AAACC ATG ) 

e) The Ncol fragment isolated from pRIT 15202 is ligated to pRIT15200, previously 
open with Ncol, in the righ orientation, in such a way the pCUP1 promoter is at the 5' side 
of the coding sequence. This results in the generation of a final expression plasmid named 
pRtT1 5201 (see figure 7). 

3. - Preparation of the recombinant yeast strain Y1796 (RIX4440) 

The plasmid pRIT 15201 is used to transform the S. cerevisiae strain DC5 (ATCC 
20820). After selection and characterisation of the yeast transformants containing the 
plasmid pRIT 15201 a recombinant yeast strain named Y1796 expressing CPC-P501-His 
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fusion protein is obtained. The protein after reduction and carboxyamidation, is isolated 
and purified by affinity, chromatography (IMAC) followed by anion exchange 
chromatography (Q Sepharose FF). 

Example jj 

In analogous fashion proteins constructs as depicted in figure 2 may be expressed 
utilising the corresponding DNA sequences shown therein. In particular, yeast strain 
SC333 (construct 2) corresponds to Y1796 strain but expressing P501 55.553 devoid of the 
CPC fusion partner. Yeast strain Y1800 (construct 3) corresponds to Y1796 strain but 
additionally comprises the native sequence signal for P501S (aa1-aa34), while yeast strain 
Y1802 (construct 4) comprises the alpha pre signal sequence upstream P501S sequence. 
Yeast strain Y1790 (construct 5) is expressing a P501S construct devoid of CPC and 
having the alpha prepro signal sequence. 

Example 111. Preparation of purified CPC-P501 

1. - Production of CPC-P501S HIS (Y1796) at small scale 

For Y1796, in minimal medium supplemented with histidine, expression is induced in log 
phase by addition of CuS04 ranging from 100 to 500 pM, and culture is maintained at 30°. 
Cells are harvested after 8 or 24H induction. Copper is added just before use and not 
mixed with medium in advance. 

For SDS PAGE analysis, yeast cells extraction is performed in citrate phosphate buffer 
pH4.0 + 130 mM NaCI. Extraction is performed with glass beads for small cell quantity and 
with French press for higher cells quantity, and then mixed with sample buffer and SDS- 
PAGE analyzed. 

As shown in Table 1 below, the level of expression of the culture is much higher for Y1796 
strain as compared to the expression level of parent strain SC333, a strain expressing the 
corresponding P501S-His devoid of CPC partner. Likewise, the presence of a signal 
sequence (alpha pre) does not affect the results discussed above: the level of expression 
of the culture is much higher for Y1802 strain as compared to the expression level of 
corresponding strain Y1790, a strain expressing the corresponding P501S-His devoid of 
CPC partner. 
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CPC = clyta P2 ciyta 
ND= not detectable 



2. - Fermentation of Y1796 (RIX4440) at larger scale 

100pl of the working seed are spread on solid medium and grown for approximately 24h at 
30°C. This solid pre-culture is then used to inoculate a liquid pre-culture in shake flasks. • 
This liquid pre-culture is grown for 20h at 30°C and transferred into a 20L fermenter. The 
fed-batch fermentation includes a growth phase of about 44h and an induction phase of 
about 22h. 

The carbon source (glucose) was supplemented to the culture by a continuous feeding. 
The residual glucose concentration was maintained very low (<50mg/L) in order to 
minimise the ethanol production by fermentation. This was realised by limiting the 
development of the micro-organism by limited glucose feed rate. 

At the end of the growth phase, CUP1 promoter is induced by adding CuS04 in order to 
produce the antigen. 

The absence of contaminations was checked by inoculating 10 6 cells into standard TSB 
and THI vials supplemented with nystatine and incubated respectively for 14 days at 20- 
25°C and at 30-35°C. No growth was observed as expected. 

3. - Antigen characterisation and productivity 

Cell homogenates were prepared by French pressing of fermentation samples harvested at 
different times during the induction phase and analysed by SDS-PAGE and Western Blot. It 
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was shown that the major part of the protein of interest was located in the insoluble fraction 
obtained from the cell homogenate after centrifugation. The SDS-PAGE and Western Blot 
analyses shown in the Figures below were realised on the pellets obtained after 
centrifugation of these cell homogenates. 

Figures 8 A and B show a kinetics of the antigen production during the induction phase for 
culture PR0127. It appears that no antigen expression occurred during the growth phase. 
The specific antigen productivity seems to increase from the beginning of the induction 
phase up to 6h and then remained quite stable up to the end. But the volumetric 
productivity increased by a factor 1.5 to 2 due to biomass accumulation observed during 
the same period of time. The antigen productivity was estimated at about 500 mg per litre 
"of fermentation broth by comparing purified reference of the antigen and crude extracts on 
SDS-PAGE with silver staining (figure 8A) and WB analyses using an anti-P501S antibody 
(a murine ascite directed against P501S aa439-aa459 used at a dilution of 1/1000) (figure 
8B). 

Example IV. Purification of CPC-P501 f51-553)-Hts f usion protein produced by Y1796 

After the cell breakage, the protein is associated with the pellet fraction. A carbamido- 
methylation of the molecule has been introduced in the process in order to cope with the 
oxidative aggregation of the molecule with itself or with host cell protein contaminants 
through covalent bridging with disulphide bonds. The use of detergents has also been 
required to manage the hydrophobic character of this protein (12 trans-membrane domains 
predicted). 

The purification protocol, developed for the scale of 1 L of culture OD (optical 
density) 120, is described in figure 9. All the operations are performed at room 
temperature (RT). 

According to DOC TCA BCA protein assay, the global purification yield is 30 - 70 mg of 
purified antigen / L of culture OD 120. The yield is linked to the level of expression of the 
culture and is higher as compared to the purification yield of parent strain expressing 
unfused P501S-His. 

The protein assay is performed as followed: proteins are first precipitated using TCA 
(trichloroacetic acid) in the presence of DOC (deoxycholate) then dissolved in a alcaline 
medium in the presence of SDS. The proteins then react with BCA (bicinchoninic acid) 
(Pierce) to form a soluble purple complex presenting a high adsorbance at 562 nm, which 
is proportional to the amount of proteins present in the sample. 
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SDS-PAGE analysis of 3 purified bulks (figure 10) shows no difference in reducing and 
non reducing conditions (cf. lanes 2, 3 and 4 versus lanes 5, 6 and 7). The pattern 
consists of a major band at 70 kDa, a smear of higher MW and faint degradation bands. All 
the bands are detected by a specific anti P501S monoclonal antibody. 



Example V. Vaccine preparation using CPC- P501S His protein 

The protein of Example 3 or 4 can be formulated into a vaccine containing QS21 
and 3D-MPL in an oil in water emulsion. 

1. — Vaccine preparation: 

The antigen produced as shown in Example 1 to 3 a C-LytA - P2 - P501S His. As 
an adjuvant, the formulation comprises a mixture of 3 de -O-acylated monophosphoryl lipid 
A (3D-MPL) and QS21 in an oil/water emulsion. The adjuvant system SBAS2 has been 
previously described WO 95/17210. 

3D-MPL: is an immunostimulant derived from the lipopolysaccharide (LPS) of the 
Gram-negative bacterium Salmonella minnesota. MPL has been deacylated and is lacking 
a phosphate group on the lipid A moiety. This chemical treatment dramatically reduces 
toxicity while preserving the immunostimulant properties (Ribi, 1986). Ribi 
Immunochemistry produces and supplies MPL to SB-Biologicals. 
Experiments performed at Smith Kline Beecham Biologicals have shown that 
3D-MPL combined with various vehicles strongly enhances both the humoral and a TH1 
type of cellular immunity. 

QS21: is a natural saponin molecule extracted from the bark of the South American 
tree Quillaja saponaria Molina. A purification technique developed to separate the 
individual saponins from the crude extracts of the bark, permitted the isolation of the 
particular saponin, QS21 , which is a triterpene glycoside demonstrating stronger adjuvant 
activity and lower toxicity as compared with the parent component. QS21 has been shown 
to activate MHC class I restricted CTLs to several subunit Ags, as well as to stimulate Ag 
specific lymphocytic proliferation (Kensil, 1992). Aquila (formally Cambridge Biotech 
Corporation) produces and supplies QS21 to SB-Biologicals. 



40 



VB60013 
► 

Experiments performed at SmithKline Beecham Biologicals have demonstrated a 
clear synergistic effect of combinations of MPL and QS21 in the induction of both humoral 
and TH1 type cellular immune responses. 

The oil/water emulsion is composed an organic phase made of of 2 oils 
(a tocopherol and squalene), and an aqueous phase of PBS containing Tween 80 as 
emulsifier. The emulsion comprised 5% squalene 5% tocopherol 0.4% Tween 80 and had 
an average particle size of 180 nm and is known as SB62 (see WO 95/17210). 

Experiments performed at SmithKline Beecham Biologicals have proven that the 
adjunction of this O/W emulsion to 3D-MPL/QS21 (SBAS2) further increases the 
immunostimulant properties of the latter against various subuntt antigens. 

2. - Preparation of emulsion SB62 (2 fold concentrate): 

Tween 80 is dissolved in phosphate buffered saline (PBS) to give a 2% solution in 
the PBS. To provide 100 m! two fold concentrate emulsion 5g of DL alpha tocopherol and 
5ml of squalene are vortexed to mix thoroughly. 90ml of PBS/Tween solution is added and 
mixed thoroughly. The resulting emulsion is then passed through a syringe and finally 
microfiuidised by using an M110S microfluidics machine. The resulting oil droplets have a 
size of approximately 180 nm. 

3. - Formulations: 

A typical formulation containing 3D-MPL and QS21 in an oil/water emulsion is 
performed as follows: 20^ig - 25 pg C-LytA P2-P501S are diluted in 10 fold concentrated 
of PBS pH 6.8 and H 2 0 before consecutive addition of SB62 (50^1), MPL (20tig), QS21 
(20ng), optionally comprising CpG oligonucleotide (100 pg) and 1 ^ig/ml thiomersal as 
preservative. The amount of each component may vary as necessary. All incubations are 
carried out at room temperature with agitation. 

Example VI. Codon-optimised P501S sequences 
1.- Generation of the control recombinant plasmids: 
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Full-length P501S sequence was cloned into pVAC (Thomsen, Immunology, 1998; 
95:51OP105), generating expression plasmid JNW680. SEQ ID NO:17 represents human 
P501S expression cassette in the plasmid JNW680 and is illustrated in Figure 11. The 
protein sequence of SEQ ID NO: 17 is shown in single letter format, the start and stop 
codons being shown in bold. The Kozak sequence is denoted by the hash symbols. The 
codon usage index of the human P501S sequence (SEQ ID NO:17) is 0.618, as calculated 
by the SynGene programme. 

SvnGene programme 

Basically, the codons are assigned using a statistical method to give synthetic gene 
having a codon frequency closer to that found naturally in highly expressed E.coli and 
human genes. 

SynGene is an updated version of the Visual Basic program called Calcgene, 
written by R. S. Hale and G Thompson (Protein Expression and Purification Vol. 12 
pp.1 85-1 88 (1998). For each amino acid residue in the original sequence, a codon was 
assigned based on the probability of it appearing in highly expressed E.coli genes. Details 
of the Calcgene program, which works under Microsoft Windows 3.1, can be obtained from 
the authors. Because the program applies a statistical method to assign codons to the 
synthetic gene, not all resulting codons are the most frequently used in the target 
organism. Rather, the proportion of frequently and infrequently used codons of the target 
organism is reflected in the synthetic sequence by assigning codons in the correct 
proportions. However, as there is no hard-and-fast rule assigning a particular codon to a 
particular position in the sequence, each time it is run the program will produce a different 
synthetic gene - although each will have the same codon usage pattern and each will 
encode the same amino acid sequence. If the program is run several times for a given 
amino acid sequence and a given target organism, several different nucleotide sequences 
will be produced which may differ in the number, type and position of restriction sites, 
intron splice signals etc., some of which may be undesirable. The skilled artisan will be 
able to select an appropriate sequence for use in expression of the polypeptide on the 
basis of these features. 

Furthermore, since the codons are randomly assigned on a statistical basis, it is 
possible (although perhaps unlikely) that two or more codons which are relatively rarely 
used in the target organism might be clustered in close proximity. It is believed that such 
clusters may upset the machinery of translation and result in particularly low expression 
rates, so the algorithm for choosing the codons in the optimized gene excludes any codons 
with an RSCU value of less than 0.2 for highly expressed genes in order to prevent any 
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rare codon clusters being fortuitously selected. The distribution of the remaining codons is 
then allocated according to the frequencies for highly expressed E.coli to give an overall 
distribution within the synthetic gene that is typical such genes (coefficient = 0.85) and also 
for highly expressed human genes (coefficient = 0.50). 

Syngene (Peter Ertl, unpublished), an updated version of the Calcgene program, 
allows exclusion of rare codons to be optional, and is also used to allocate codons 
according to the codon frequency pattern of highly expressed human genes. 

The sequence of the CPC-P501S cassette cloned from the vector pRIT15201 (see 
Figure 7) into pVAC, thereby generating plasmid JNW735, is set forth in SEQ ID NO: 18 
and is illustrated in Figure 12. This sequence is identical to the pRlT15201 sequence with 
the exception of the removal of the His tag and the addition of a Kozak sequence 
(GCCACC) and appropriate restriction enzyme sites. The amino acid sequence of SEQ ID 
NO:18 is shown in single letter format, the start and stop codons are shown in bold. The 
boxed residues are the P2 helper epitope of tetanus toxoid. The underlined residues are 
the Clyta purification tag. The Kozak sequence is denoted by the hash symbols. 

2. - Generation of the recombinant plasmids with P501S codon optimised 
sequences: 

Although the codon coefficient index (CI) of P501S native sequence is already high 
(0.618), it is possible increase the CI value further. This will have two potential benefits - to 
improve the antigen expression and/or immunogenicity and to reduce the possibility for 
recombination between the P501S vector and genomic sequences. 

Using the Syngene programme, a selection (SEQ ID NO:19 to SEQ ID NO:20) of codon 
optimised sequences was obtained (Figure 13). Table 2 below shows a comparison of the 
codon coefficient index for the starting P501S sequence and the two representative codon 
optimised sequences, selected on the basis of a suitable restriction enzyme site profile and 
a good CI index. 

Table 2 - Comparison of the codon coefficient indices of two codon optimised P501S 
genes 



Sequence 


Codon coefficient index (CI) 


P501S 

SEQ ID NO:19 


0.618 
0.725 
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SEQ ID NO:20 



0.755 



3. Further evaluation of the codon-optimised sequences 



Sequence SEQ ID NO:19 

Although SEQ ID NO: 19 has a good CI index (0.725), it contains a doublet of two very 
rare codons at amino acids position 202 and 203. These codons were manually 
substituted with more frequent codons by changing the DNA sequence from TTGTTG to 
CTGCTG. To facilitate cloning and expression, restriction enzyme sites and a Kozak 
sequence were added. The final engineered sequence (SEQ ID NO:21) is shown in Figure 
14. The Syngene programme was used to fragment this sequence into oligonucleotides 
with a minimum overlap of 19-20 bases. Therefore, Figure 14 shows the re-engineered 
P501S codon optimised sequence 19. Restriction enzyme sites are underlined, Kozak 
sequence is bolded, re-engineered DNA sequence to remove a rare codon doublet is 
boxed. 

Using a two-step PCR protocol, the overlapping primers generated by the Syngene 
programme were first assembled using a PCR Assembly protocol (detailed below). The 
assembly reaction generates a diverse population of fragments. The correct full-length 
fragment was recovered/amplified using the PCR recovery protocol and the terminal 
primers. The resulting PCR fragment was excised from an agarose gel, purified, restricted 
with Nhel and Xhol and cloned into pVAC. Positive clones were identified by restriction 
enzyme analysis and confirmed by double-stranded sequencing. This generates plasmid 
JNW766, which, due to the error-prone nature of the PCR process, contained a single 
silent mutation (C to T at position 360 of SEQ ID NO: 21 ). 

1 . Assembly reaction - PCR conditions, generic protocol 
Reaction mix (total volume = 50^1): 

- 1x Reaction buffer (Pfx or Proofstart) 

- 1 fil Oligo pool (equal mix of ail overlapping oligos) 

- 0.5mM dNTPs 

- DNA polymerase (Pfx or Proofstart, 2.5-5U) 
. +/-imMMgS0 4 

- +/- 1x enhancer solution (Pfx enhancer or Proofstart buffer Q) 
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1 . 94°C for 1 20s (Proofstart only) 

2. 94°Cfor30s 

3. 40°C for 120s 

4. 72°Cfor10s ffcf* 



5. 94°Cfor15s 

6. 40°Cfor30s 

7. 72°C for 20s + 3s/cycle 

8. Cycle to step 5, 25 times 

9. Holdat4°C 

2. Recovery reaction - PCR conditions (generic protocol) 

Reaction mix (total volume = 50pJ): 

- 1x Reaction buffer (Pfx or Proofstart) 

- 5-1 0|x1 assembly reaction mix 

- 0.3-0.75mM dNTPs 

- 50pmol primer (5' terminal primer, sense orientation) 

- 50pmol primer (3' terminal primer, anti-sense orientation) 
• - DNA polymerase (Pfx or Proofstart, 2.5-5U) 

- +/-1mM MgS0 4 

- +/- 1x enhancer solution (Pfx enhancer or Proofstart buffer Q) 

1. 94°C 120s (Proofstart only) 

2. 94°C45s 

3. 60°C30s 

4. 72°C120s 

5. Cycle to step 2, 25 times 

6. 72°C240s 

7. Holdat4°C 

Sequence SEQ ID NO:20 

Although SEQ ID NO: 20 has a very good CI index (0.755), it was noticed that it contained 
a doublet of two very rare codons at amino acids position 131 and 132. These codons 
were manually substituted with more frequent codons by changing the DNA sequence 
from TTGTTG to CTGCTG. To facilitate cloning, an internal BamHI site was removed by 
mutating G to C (see the double-underlined nucleotide in Figure 15). To facilitate cloning 
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and expression, restriction enzyme sites and a Kozak sequence were added. The final 
engineered sequence (SEQ ID NO:22) is shown in Figure 15. The Syngene programme 
was used to fragment this sequence into oligonucleotides with a minimum overlap of 19-20 
bases. 

Figure 15 therefore shows the re-engineered P501S codon optimised sequence 20 (SEQ 
ID NO:22). Restriction enzyme sites are underlined, Kozak sequence is bolded, re- 
engineered DNA sequence to remove a rare codon doublet is boxed and a silent point 
mutation to remove a BamHI site is double-underlined. 

Using a similar two-step PCR protocol to the one described above, full-length P501S 
fragment was amplified and cloned into pVAC. Positive clones were identified by restriction 
enzyme analysis and confirmed by double-stranded sequencing. This generates plasmid 
JNW764. The sequence of the P501S coding cassette is shown in Figure 15 (SEQ ID NO: 
22). 

DNA Sequence similarity 

Pair distances following alignment by the ClustalV (weighted) method are shown in Table 3 
below. The table shows percent similarity between the starting human P501S sequence 
and the two codon optimised sequences SEQ ID NO:21 and 22 selected for further 
investigation. The data confirms that the codon optimised DNA sequences are 
approximately 80% similar to the original P501S sequence. 



Table 3 



SEQ ID NO: 


% similarity with starting P501S sequence 


21 
22 


79.6 
79.4 



Example VII. Codon-optimised CPC sequences 
1 Approach 

Since the original CPC sequence was originally designed for optimal expression in yeast, 
this document describes the process of codon optimising for human expression. 
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2.- Sequence design 



The starting sequence for the optimisation of CPC is shown in Figure 16 (SEQ ID NO: 23). 
This is derived entirely from the pRIT15201 and contains the entire coding sequence of 
CPC plus four amino acids of P501S to facilitate downstream cloning. Using the Syngene 
programme, a selection of codon optimised sequences were obtained, from which 
representative sequences are shown in Figure 17 (SEQ ID NO: 24-25). Table 4 below 
shows a comparison of the codon coefficient index for the starting CPC sequence and the 
two representative codon optimised sequences. 



Table 4. Codon coefficient indices for two CPC optimised sequences 



Sequence 


Codon coefficient index (CI) 


Original CPC = SEQ ID NO:23 
SEQ ID NO:24 
SEQ ID NO:25 


0.506 
0.809 
0.800 



In addition to the codon optimisation, all sequences were also screened for restriction 
enzyme cloning sites. On the basis of the highest CI value and a favourable restriction 
enzyme site profile, SEQ ID NO: 24 was selected for construction. To facilitate cloning 
and expression, 5' and 3' cloning sites were added and a Kozak sequence (GCCACC) was 
inserted 5* of the initiating ATG start codon. This engineered sequence is shown in Figure 
18 (SEQ ID NO:26). This sequence includes four amino aicds of P501S (boxed), restriction 
enzyme cloning sites (Nhel and Xhol, underlined), a Kozak sequence (Bold), a stop codon 
(italicised) and 4bp of flanking irrelevant DNA to facilitate cloning. 

The Syngene programme was used to fragment this sequence into 50-60-mer 
oligonucleotides with a minimum overlap of 18-20 bases . 

Using a similar two-step PCR protocol to the one described above, the correct fragment 
was recovered/amplified and cloned into pVAC. Positive clones were identified by 
restriction enzyme analysis and sequence verified generating vector JNW759. 

4.- DNA similarity 

Pair Distances following alignment ClustalV (Weighted) are shown in Table 5 below. The 
table shows percent similarity at the DNA level between the starting sequence of CPC and 
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the codon optimised sequence and confirms that the codon optimised sequences are 
approximately 80% similar to the original CPC sequence. 

;. 8 " ~ 



Table 5 



Sequence SEQ ID NO: 


% similarity with ! 
starting CPC sequence 


24 

25 


80.2 
81.6 



Example VIII. Proposed construction of the P501S fusion candidate 

All the candidates shown in the schematic below are codon optimised and constructed 
using overlapping PCR methodologies from plasmids JNW764 and JNW759 as templates 
(SEQ ID NO: 22 and SEQ ID NO: 26 respectively), and cloned Into the expression vector 
p7313 ie. 

The four candidates shown schematically below are based upon CPC-P501S. Codon 
optimised CPC-P501S is construct A. Candidates B, C, D also include the sequence 
encoding the N terminal 50 amino acids of P501S, positioned either at the N terminus of 
CPC-P501S (construct D), the C terminus of CPC-P501S (construct C), or between CPC 
and P501S (construct B). A schematic representation of the constructs is given in Figure 
19. 

The nucleotide and protein sequence for each of the four constructs is shown in SEQ ID 
NO: 37-40. In constructs A, C and D t the underlined codon preferentially encodes tyrosine 
(either TAC or TAT) but the nucleotide sequence may be altered to encode threonine 
(either ACA, ACC, ACG or ACT). In construct B f the underlined codon preferentially 
encodes threonine (either ACA, ACC, ACG or ACT), but the nucleotide sequence may be 
altered to encode tyrosine (either TAC or TAT). In all constructs, the coding sequence is 
flanked by appropriate restriction enzyme cloning sites (in this case, Notl and BamHI), and 
a Kozak sequence immediately upstream of the initiating ATG. 



Example IX. Immunoqenicitv experiments using particle -mediated intra-dermal 
delivery (PMID) studies 
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Full-length P501S, when delivered by particle mediated intra-dermai delivery (PMID). 
generates good antibody & cellular responses. These data demonstrate that the PMID is a 
very effective delivery route. Furthermore, comparison of P501S and CPC-P501S confirms 
that CPC-P501S induces?- a stronger immune response as determined by peptide 



ELISPOT. 

1.- Materials & Methods 

1.1. Cutaneous gene gun immunisation 

Plasmid DNA was precipitated onto 2\im diameter gold beads using calcium chloride and 
spermidine. Loaded beads were coated onto Tefzel tubing as described (Eisenbraum et al, 
i9937 Pertmer et al, 1996). Particle bombardment was performed using the'Accell gene 
delivery system (PCT WO 95/19799). For each plasmid, female C57BL/6 mice were 
immunised on days 0, 21, 42 and 70. Each administration consisted of two bombardments 
with DN A/gold, providing a total dose of approximately 4-5 \ig of plasmid. 

1.2. ELISPOT assays for T cell responses to the P501S gene product 

a) Preparation of splenocvtes 

Spleens were obtained from immunised animals at 7-14 days post boost. Spleens were 
processed by grinding between glass slides to produce a cell suspension. Red blood cells 
were lysed by ammonium chloride treatment and debris was removed to leave a fine 
suspension of splenocytes. Cells were resuspended at a concentration of 8x10 6 /ml in 
RPMI complete media for use in ELISPOT assays. 

b) Screening of peptide library 

A peptide library covering a majority of the P501S sequence was obtained from Corixa 
Corp. The library contained fifty 15-20mer peptides overlapping by 4-11 amino acids 
peptides. The peptides are numbered 1-50. In addition, a prediction programme (H-G. 
Rammensee, et al.: Immunogenetics, 1999, 50: 213-219) (http://syfpeithi.bmi- 
heidelberq.comA was used to predict putative Kb and Db epitopes from the P501S 
sequence. The ten best epitopes for Kb and Db were ordered from Mimotopes (UK) and 
included in the library (peptides 51-70). For screening of the peptide library, peptides were 
used at a final concentration of 50jig/m1 (approx. 25-50pM) in IFNy and IL-2 ELISPOTS 
using the protocol described below. For IFNy ELISPOTS, IL-2 was added to the assays at 
10ng/ml. Splenocytes used for the screening were taken at day 84 from C57BL/6 mice 
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immunised at day 0, 21, 42 and 70. Three peptides were identified from the library screen 
- Peptides 18 (HCRQAYSVYAFMISLGGCLG), 22 (GLSAPSLSPHCCPCRARLAF) and 48 
(VCLAAGITYVPPLLLEVGV). These peptides were subsequently used in the ELISPOT 
assays 



c) ELISPOT assay 

Plates were coated with 15pg/ml (in PBS) rat anti mouse IFNy or rat anti mouse IL-2 
(Pharmingen). Plates were coated overnight at +4*C. Before use the plates were washed 
three times with PBS. Splenocytes were added to the plates at 4x1 0 s cells/well. Peptides 
identified in the library screen were re-ordered from Genemed Synthesis and used at a 
final concentration of 50M.g/ml. CPC-P501S protein (GSKBio) was used in the assay at 
-20ng/ml.-ELISPOT assays were carried out in the presence of either IL-2 (10ng/ml), IL-7 
(10ng/ml) or no cytokine. Total volume in each well was 200pl. Plates containing peptide 
stimulated cells were incubated for 16 hours in a humidified 37*C incubator. 

e) Development of ELISPOT assay plates. 

Cells were removed from the plates by washing once with water (with 10 minute soak to 
ensure lysis of cells) and three times with PBS. Biotin conjugated rat anti mouse IFNg or 
IL-2 (Phamingen) was added at 1jjg/ml in PBS. Plates were incubated with shaking for 2 
hours at room temperature. Plates were then washed three times with PBS before addition 
of Streptavidin alkaline phosphatase (Caltag) at 1/1000 dilution. Following three washes in 
PBS spots were revealed by incubation with BCICP substrate (Biorad) for 15-45 mins. 
Substrate was washed off using water and plates were allowed to dry. Spots were 
enumerated using an image analysis system devised by Brian Hayes, Asthma Cell Biology 
unit, GSK. 

1.3. ELISA assay for antibodies to the P501S gene product 

Serum samples were obtained from the animals by venepuncture on days -1 , 28, 49 and 
56, and assayed for the presence of anti-P501S antibodies. ELISA was performed using 
Nunc Maxisorp plates coated overnight at 4°C with 0.5pg/ml of CPC-P501S protein 
(GSKBio) in sodium bicarbonate buffer. After washing with TBS-Tween (Tris-buffered 
saline, pH 7.4 containing 0.05 % of Tween 20) the plates were blocked with Blocking buffer 
(3% BSA in TBS-Tween buffer) for 2hrs at room temperature. All sera were incubated at 
1:100 dilution for 1hr at RT in Blocking buffer. Antibody binding was detected using HRP- 
conjugated rabbit anti-mouse immunoglobulins (#P0260, Dako) at 1:2000 dilution in 
Blocking buffer. Plates were washed again and bound conjugate detected using Fast OPD 
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colour reagents (Sigma. UK). The reaction was stopped by the addition of 3M sulphuric 
acid, and the OPD product quantitated by measuring the absorbance at 490nm. 



1 .4. Transient transfection -assays 

Human P501S expression from various DNA constructs was analysed by transient 
transfection of the plasmids into CHO (Chinese hamster ovary) cells followed by Western 
blotting on total cell protein. Transient transfections were performed with the Transfectam 
reagent (Promega) according to the manufacturer's guidelines. In brief, 24-well tissue 
culture plates were seeded with 5x1 0 4 CHO cells per well in 1ml DMEM complete medium 
(DMEM, 10% FCS, 2mM L-glutamine, penicillin 100IU/ml, streptomycin 100pg/ml) and 
Tncubatecffor 16houreat 37°~C. 0.5M9 DNA was adde<TtcT2^o70.3M NaCI (sufficient for 
one well) and 2pl of Transfectam was added to 25|jl of Milli-Q. The DNA and Transfectam 
solutions were mixed gently and incubated at room temperature for 15 minutes. During this 
incubation step, the cells were washed once in PBS and covered with 150pl of serum free 
medium (DMEM, 2mM L-glutamine). The DNA-Transfectam solution was added drop wise 
to the cells, the plate gentle shaken and incubated at 37°C for 4-6 hours. 500pl of DMEM 
complete medium was added and the cells incubated for a further 48-72 hours at 37°C. 

2. Western blot analysis of CHO cells transiently transfected with P501S plasmids 



The transiently transfected CHO cells were washed with PBS and treated with a Versene 
(1:5000)70.025% trypsin solution to transfer the cells into suspension. Following 
trypsinisation, the CHO cells were pelleted and resuspended in 50pl of PBS. An equal 
volume of 2x NP40 lysis buffer was added and the cells incubated on ice for 30 minutes. 
100jJ of 2x TRIS-Glycine SDS sample buffer (Invitrogen) containing 50mM DTT was 
added and the solution heated to 95°C for 5 minutes. 1-20pl of sample was loaded onto a 
4-20% TRIS-Glycine Gel 1.5mm (invitrogen) and electrophoresed at constant voltage 
(125V) for 90 minutes in 1x TRIS-Glycine buffer (Invitrogen). A pre-stained broad range 
marker (New England Biolabs, #P7708S) was used to size the samples. Following 
electrophoresis, the samples were transferred to Immobilon-P PVDF membrane 
(Millipore), pre-wetted in methanol, using an Xcell III Blot Module (Invitrogen), 1x Transfer 
buffer (Invitrogen) containing 20% methanol and a constant voltage of 25V for 90 minutes. 
The membrane was blocked overnight at 4°C in TBS-Tween (Tris-buffered saline, pH 7.4 
containing 0.05 % of Tween 20) containing 3% dried skimmed milk (Marvel). The primary 
antibody (10E3) was diluted 1:1000 and incubated with the membrane for 1 hour at room 
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temperature. Following extensive washing in TBS-Tween, the secondary antibody (HRP- 
conjugated rabbit anti-mouse immunoglobulins (#P0260, Dako)) was diluted 1:2000 in 
TBS-Tween containing 3% dried skimmed milk and incubated with the membrane for one 
hour at room temperature. Following extensive washing, the membrane was incubated 
with Supersignal West Pico Chemiluminescent substrate (Pierce) for 5 minutes. Excess 
liquid was removed and the membrane sealed between two sheets of cling film, and 
exposed to Hyperfilm ECL film (Amersham-PharmaciaBiotech) for 1-30 minutes. 

3. Generation of the Full-length human P501S expression cassette 

The starting point for the construction of a P501S expression cassette was the plasmid 
pcDNA3.1-P501S (Corixa Corp), which has a pcDNA3.1 backbone (Invitrogen) containing 
a full-length human P501S cDNA cassette cloned between the EcoRI and Notl sites. This 
vector is also termed JNW673. The presence of P501S was confirmed by fluorescent 
sequencing. The sequence of the cDNA cassette is given by the NCBI/Genbank sequence 
(accession number AY033593). Human P501S was PCR amplified from JNW673 template 
DNA, restricted with Xbal and Sail and cloned into the Nhel/Xhol sites of pVAC generating 
vector JNW680. The correct orientation of the fragment relative to the CMV promoter was 
confirmed by PCR and by DNA sequencing. The sequence of the expression cassette is 
shown in Figure 11 (SEQ ID NO: 17). 

To construct a CPC-P501S expression cassette, CPC-P501S was PCR amplified from the 
vector pRIT15201 (see Figure 7), restricted with Xbal and Sail and cloned into the Nhel 
and Xhol sites of pVAC, generating plasmid JNW735. The correct orientation was 
confirmed by PCR and sequencing. The sequence of the CPC-P501S expression cassette 
is shown in Figure 12 (SEQ ID NO:18). 

4. Expression of human P501S from plasmids JNW680 and JNW735 

The P501S expression plasmids were transiently transfected into CHO cells and a total 
cell lysate prepared as described in methods. A Western blot of a total cell lysate identified 
single bands of approximately 55kDa and 62kDa for samples transfected with JNW680 
and JNW735 respectively (Figure 20). This is consistent with the predicted molecular 
weights of 59.3kDa and 63.3kDa for P501S and CPC-P501S respectively. The addition of 
the CPC tag does not adversely affect the expression of P501S. 

5. Results 
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5.1. Antibody responses to human P501S following PMID immunisation 

The antibody responses following immunisation with pVAC (empty vector) and pVAC- 
P501S (JNW680) were assessed by ELISA following a primary immunisation by PMID at 
day 0 and three boosts at day 21 and day 42 and day 70. Figure 21 shows the antibody 
responses from sera taken at day -1, day 28 and day 49 (mice A1-3, B1-3) and day 56 
(mice A4-6, B4-9). Whilst there were some non-specific responses to the pVAC empty 
vector, specific responses to the P501S construct were seen in 5 of 9 mice. 

5.2. Identification of novel T cell epitopes from human P501S in C57BL/6 mice by 
screening of a P501S peptide library 

Following immunisation with JNW680 (pVAC-P501S) by PMID at day 0 and three boosts 
at day 21 and day 42 and day 70, ELISPOT assays were carried out at day 84. Peptides 
from the P501S library were tested at 50pg/ml final concentration. From this initial screen, 
three peptides were found to stimulate IFNy and/or IL-2 secretion. Peptides 18, 22 and 48 
(Figure 22). These peptides were used in subsequent cellular assays. 

5.3. Cellular responses to pVAC-P501 S (JNW680) following PMID immunisation 

The cellular responses following immunisation with pVAC (empty vector) and pVAC-P501S 
were assessed by ELISPOT following a primary immunisation by PMID at day 0 and three 
boosts at day 21. 42 and 70. Assays were carried out 7 days post boost. Two different 
assay conditions were used: 1) Peptides 18, 22 and 48 identified in the peptide library 
screen used at 50tig/ml final concentration and 2) CPC-P501S protein used at 20ug/ml 
final concentration. Figure 23A shows that whilst there were no P501 S-specific responses 
to the empty vector (A4-6), the pVAC-P501 S construct induced specific IFN-y responses to 
Peptides 18 and 22 in all mice (B6-9) whilst one mouse (B7) also showed an IFN-y 
response to Peptide 48. Figure 23B shows that all mice showed specific IL-2 responses to 
Peptides 18, 22 and 48. Furthermore, pVAC-P501S immunised mice (B6-9) also showed 
moderate IL-2 responses to CPC-P501S, whereas the empty vector immunised mice (A4- 
6) showed no responses. 

5.4. Comparison of cellular responses to P501S and CPC-P501S following PMID 
immunisation. 

The cellular responses following immunisation with pVAC (empty vector), pVAC-P501S 
(JNW680) and CPC-P501S (JNW735) were assessed by ELISPOT following a primary 
immunisation by PMID at day 0 and boosts at day 21 and 42. Assays were carried out 7 
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days post boost. Two different assay conditions were used: 1) Peptides 18, 22 and 48 
identified in the peptide library screen used at 50jig/ml final concentration and 2) CPC- 
P501S protein used at 20jig/ml final concentration. Figure 24 shows that at day 28, CPC- 
P501S induced good IL-2 responses to lOjig/ml of peptide 22, whilst there were no 
P501S-specific responses to either the empty vector or the pVAC-P501S. These results 
were also seen using CPC-P501S protein to re-stimulated the splenocytes. At day 49 (post 
2 nd boost), the responses induced by P501S and CPC-P501S were equivalent. These data 
suggest that the addition of the CPC tag improves the kinetics and/or magnitude of the 
response to P501S. 

Example IX. lmmunoqenicitv experiments in mice using Protein + adjuvant studies 
1. Design and adjuvant formulation 

The immune response induced by vaccination using the recombinant purified CPCP501S 
protein formulated in adjuvants is characterized in experiments performed in mice. 
Groups of 5 to 10, eight weeks old female C57BL6 mice are vaccinated, 2-6 times intra- 
muscularly at 2 weeks intervals with 10 dg of the CPCP501S protein formulated in 
different adjuvant systems. The volume administered corresponds to 1/1 0 th of a human 
dose (50 jj.!). 

The serology (total Ig response) and cellular response (T cell lymphoproliferation and 
cytokine production) are analyzed on spleen cells, 6-14 days after the last vaccination 
using standard protocols as described in Gerard, c. et ai, 2001, Vaccine 19, 2583-2589. 

The data of one representative experiment is shown. It included 5 groups of eight C57BI/6 
female mice which received 4 intramuscular injections of CPC P501 (10pg) + adjuvant (A, 
B, C) at days 0, 14, 28, 42. Example V provides an experimental protocol of how to carry 
out the formulations. Briefly the adjuvant formulations are as follows (quantities given for 
one dose of 100pl)): 

- Adjuvant A: QS21 (10pg), MPL (10pg) and CpG7909 (100 pg) made according to the 
method disclosed in WO 00/62800; 

- Adjuvant B: formulation of QS21 (20pg), MPL (20pg), CpG7909 (100 pg) and 50 pf 
SB62 oil-in-water emulsion (WO 95/17210); 

- Adjuvant C: formulation of QS21 (10pg), MPL (10pg), CpG7909 (100 pg) and 10 pi 
SB62 oil-in-water emulsion (WO 99/12565). 
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2. Serology 



The total Ig response induced by vaccination was measured by ELISA using either the 
CPC-P501 or RA12 -P501 (C term, which is a truncated form of the P501 protein 
corresponding to the C terminus of the protein fused at its N terminus, to a TB derived 
protein RA12 - Ra12 is derived from MTB32A antigen described in Skeiky et aL, Infection 
and Immun. (1999) 67:3998-4007). 

The adjuvanted CPC-P501S proteins give a good antibody response after vaccination. 

3. Cellular response 

3.1. Lymphoproliferation 

7 days after the latest vaccine, lymphoproliferation was performed on spleen cells 
individually. 2.1 0e5 spleen cells were plated In quadruplicate, in 96 well microplate, in 
RPMI medium containing 1% normal mice serum. After 72 hours of re-stimulation with 
either the immunogen ( CPC-P501) or the truncated protein (RA12 P501) at different 
concentration , 1pCi 3H thymidine (Amersham 5Ci/ml) was added. After 16 hours, cells 
were harvested onto filter plates. Incorporated radioactivity was counted in a p counter. 
Results are expressed in CPM or as stimulation indexes* (geomean CPM in cultures with 
antigen / geomean CPM in cultures without antigen). 

Re-stimulation with ConA (2pg/ml) as positive control was included as positive control. 

As shown in Figure 25, a P501 specific lymphoproliferation is seen in the spleen of all 
groups of mice receiving the adjuvanted protein after in vitro re-stimulation with either the 
immunogen or another P501 protein made in another expression system (E coli), 
indicating that T cells have been primed in vivo by the vaccination. 

3.2. IFNg production measured by intracellular staining of spleen cells 

Bone Marrow Dendritic Cells (BMDC) obtained after culture of mouse PBL for 7 days in 
the presence of GMCSF.. 

7 days after the latest vaccine, spleen or PBL are collected and a ceil suspension 
prepared. 10e6 cells (1 pool per group) were incubated +/-18hrs with 10e5 BMDC pulsed 
overnight with 10pg/ml of either the CPCp501 protein or the RA12. 
After a treatment with the 2.4.G.2 antibody, spleen cells were stained with fluorescent anti 
CD4 and CD8 antibodies (anti CD4-APC and an anti CD8PerCP). After a permeabilization 
and fixation step, cells were stained with a fluorescent anti IFNg-FITC antibody. 
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In mice vaccinated with CPC P501 in different adjuvant, both CD4 and CD8 T cells are 
shown to produce IFNg in response to DC pulsed with either the immunogen and the C- 
term p501 made in E coli ( as shown by intracellular straining of spleen and PBLs). There 
is an increase of 4-1 OX in the % of cells making this cytokine in the groups receiving the 
adjuvanted CPC-P501S compared to the protein alone, and between 0.1 to 10% of CD4 or 
CD8 T cells are shown to produce IFNg. 

In conclusion, these data allow to conclude that the adjuvanted CPC-P501 protein is 
immunogenic in mice. 

Both a P501 specific humoral and cellular responses including IFNg production by CD4 
and CD8 T cells can be detected after several intramuscular vaccination with CPC P501 in 
adjuvants. ...... 
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Claims 

1. A fusion partner protein comprising a choline binding domain and a heterologous 
promiscuous T helper epitope. 

2. A fusion partner protein according to claim 1 wherein the choline binding domain is 
selected from the group comprising: 

a) the C-terminal domain of LytA as set forth in SEQ ID NO:7; or 

b) the sequence of SEQ ID NO:8; or 

c) a peptide sequence comprising an amino acid sequence having at least 85% 
identity, preferably at least 90% identity, more preferably at least 95% identity, 
most preferably at least 97-99% identity, to any of SEQ ID NO:1 to 6; or 

d) a peptide sequence comprising an amino acid sequence having at least 15, 20, 
30, 40, 50 or 100 contiguous amino acids from the amino acid sequence of SEQ 
ID NO:7 or SEQ ID NO:8. 

3. A fusion partner protein as claimed in claim 1 or 2 further comprising a heterologous 
protein. 

4. A fusion protein as claimed in claim 3 wherein the heterologous protein is chemically 
conjugated the fusion partner. 

5. A fusion protein as claimed in claim 3 or 4 wherein the heterologous protein is a 
tumour associated protein or tissue specific protein or immunogenic fragment thereof. 

6. A fusion protein as claimed in any of claims 3 to 5 wherein the heterologous protein or 
fragment thereof is selected from MAGE 1, MAGE 3, MAGE 4, PRAME, BAGE, LAGE, 
SAGE, HAGE, PSA, PAP, PSCA, prostein, HASH2, Cripto, Prostase, STEAP, 
tyrosinase, telomerase, survivin, CASB616 or her 2 neu. 

7. A fusion protein as claimed in any of claims 4 to 6 further comprising an affinity tag of 
at least 4 histidine residues. 

8. A nucleic acid sequence encoding a protein of claim 1 to 7. 

9. An expression vector comprising a nucleic acid sequence of claim 8. 

10. A host transformed with a nucleic acid sequence of claim 8 or with an expression 
vector of claim 9. 

11. An immunogenic composition comprising a protein as claimed in any of claim 1 to 7 or 
a DNA sequence as claimed in claim 8 and a pharmaceutical^ acceptable excipient. 
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12. An immunogenic composition as claimed in claim 11 which additionally comprises a 
TH-1 inducing adjuvant. 

13. An immunogenic composition as claimed in claim 12 in which the TH-1 inducing 
adjuvant is selected from the group of adjuvants comprising: 3D-MPL, QS21 , a mixture 
of QS21 and cholesterol, a CpG oligonucleotide or a mixture of two or more said 
adjuvants. 

14. A process for the preparation of a immunogenic composition as claimed in any of 
claims 11 to 13, comprising admixing the fusion protein of any of claims 4 to 7 or a the 
encoding polynucleotide of claim 8 with a suitable adjuvant, diluent or other 
pharmaceutically acceptable carrier. 

15. A process for producing a fusion protein of any of claims 1 to 7 comprising culturing a 
host cell of claim 10 under conditions sufficient for the production of said fusion protein 
and recovering the fusion protein from the culture medium. 

16. A protein of any of claims 1 to 7 or a DNA sequence of claim 8 for use in medicine. 

17. Use of a protein as claimed in any of claim 1 to 7 or a DNA sequence of claim 8 in the 
manufacture of a immunogenic composition for immunotherapeutically treating a 
patient suffering from or susceptible to cancer. 

18. A method of treating a patient suffering from cancer by administrating a safe and 
effective amount of a composition of claim 9. 
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Figure 1 - Sequence information for C-LytA. 

Each repeat has been defined on the basis of both multiple sequence alignment and 
secondary structure prediction using the following alignment programs: 

1) MatchBox (Depiereux E et al. (1992) Comput Applic Biosci 8:501-9) 

2) ClustalW (Thompson JD et al. (1994) Nucl Acid Res 22:4673-80) 

3) Block-Maker (Henikoff S et al (1995) Gene 163:gc17-26) 

SEQ ID NO:1 - amino acid sequence of C-LytA repeat 1 

GWQKNDTGYWYVHSD 15 

SEQ ID NO:2 - amino acid sequence of C-LytA repeat 2 

GSYPKDKFEKING TWYY FDSS 21 

SEQ ID NO:3 - amino acid sequence of C-LytA repeat 3 

GYMIiADRWRKHTDG NWYW FDNS 22 

SEQ ID NO:4 - amino acid sequence of C-LytA repeat 4 

GEMATGWKKIADKWYYFNEB 20 

SEQ ID NO:5 - amino acid sequence of C-LytA repeat 5 

GAMKTGWVKYKDT WYY IjDAKE 21 

SEQ ID NO:6 - amino acid sequence of C-LytA repeat 6 

GAMVSNAFIQSADGTG WYYL KPD 23 

SEQ ID NO:7 - amino acid sequence of C-LytA cholin-binding domain 

GWQKNDTGYW YVHSDGSYPK DKFEKINGTW YYFDSSGYML ADRWRKHTDG NWYWFDNSGE 60 
MATGWKKIAD KWYYFNEEGA MKTGWVKYKD TWYYLDAKEG AMVSNAFIQS ADGTGWYYLK 120 
PDGTLADRPE FTVEPDGLIT VK 142 

SEQ ID NO:8 - amino acid sequence of C-LytA domain from truncated repeat 1 to repeat 
6 (as part of our constructs shown in figure 2) 

YVHSDG S YP KDKFEKINGTWYYFD S S G YNLADRWRKHTDGNW YW FDNS GEMATGWKK I ADKVTYYFNEEGAMKT 
GV7VKYKDTWY YLDAKEGAMVSNAFI Q S ADGTGWYYLKPD 
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SEQ ID NO:9 - DNA sequence encoding the amino acid sequence of SEQ ID NO:1 
ggctggcaga agaatgacacitggctactgg tacgtacatt cagac 

SEQ ID NO:10 - DNA sequence encoding the amino acid sequence of SEQ ID NO:2 
ggctcttatc caaaagacaa gtttgagaaa atcaatggca cttggtacta ctttgacagt tea 

SEQ ID NO:11 - DNA sequence encoding the amino acid sequence of SEQ ID NO:3 
ggctatatgc ttgcagaccg ctggaggaag cacacagacg gcaactggta ctggttcgac aactca 

SEQ ID NO:12 - DNA sequence encoding the amino acid sequence of SEQ ID NO:4 
ggcgaaatgg ctacaggctg gaagaaaatc gctgataagt ggtactattt caacgaagaa 

SEQ ID NO:13 - DNA sequence encoding the amino acid sequence of SEQ ID NO:5 
Ggtgccatga agacaggctg ggtcaagtac aaggacactt ggtactactt agaegctaaa gaa 

SEQ ID NO:14 - DNA sequence encoding the amino acid sequence of SEQ ID NO:6 

Ggcgccatgg tatcaaatgc ctttatccag teageggacg gaacaggctg gtactacctc 
aaaccagac 

SEQ ID NO: 1 5 - DNA sequence encoding the amino acid sequence of SEQ ID NO:7 
ggctggcaga agaatgacac tggctactgg tacgtacatt cagacggctc ttatccaaaa 60 
gacaagtttg agaaaatcaa tggcacttgg tactactttg acagttcagg etatatgett 120 
gcagaccgct ggaggaagca cacagacggc aactggtact ggttcgacaa etcaggegaa 180. 
atggctacag gctggaagaa aategctgat aagtggtact atttcaacga agaaggtgee 240 
atgaagacag gctgggtcaa gtacaaggac acttggtact acttagaege taaagaaggc 300 
gecatggtat caaatgeett tatccagtca gcggacggaa caggctggta ctacctcaaa 360 
ecagaeggaa cactggcaga caggecagaa ttcacagtag agecagatgg cttgattaca 420 
gtaaaataa 429 

SEQ ID NO: 16 - DNA sequence encoding the amino acid sequence of SEQ ID NO:8 

TACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACA 
GTTCAGGCTATATGCTTGCAGACCGC1GGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGG 
CGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACA 
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GGCTGGGTCAAGTACAAGGAC ACTTGGTACTACTTAGACG CT AAAGAAGGCGCC ATGGTAT CAAATG CCTTTA 
T C CAGT CAGC GG ACGG AACAGG C TGG TACT AC CT CAAAC C AG AC 
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Figure 2. CPC and native Constructs 

Construct 1 - coding sequence of CPC-P501*iWse e plasmid of figure 7 -Y1796) 
Protein sequence (SEQ ID NO:27) 

Rl R2 R3 R4 

MAAA fYYHSD GS YPKDKFEKIN GTWYYFDSSGYMLADRWRKHTDGNWYWTO NSGEMAT(j 

R5 Eg R6 

IWKKI4DKWYYFNEEGAMKTGWVKYKP ANSKFIGITEGVliVlVSN AFIQg 

IadgtgwyylkppIgtladrpekfmymvlgigpvlglvc^ 

GILLSIi^IPRAGWLAGLLCPOT^ 

YAFMISLGGCLGYIJJ>AID\^^ 

LSAPSI^PHGCPCRARLAFRNL^^ 

GLYQGVPRAEPGTEARRHYDEGVRMGSLGLFLQCAISLWSLV^ 

aagatci^hsvavvtasaaltgitts^^ 

LPGPKPGAPI^NGHVGAGGSGLIJPPPPALCGASAaDVSVRVWGEPTEARVW 
LliSQVAPSLFMGSIVQLSQSVTAYMVSAAGLGL^ 

R1 (plain): aa5-9 (fragment) R4 (bold): aa53-72 P2 (underline): 97-1 10 

R2 (bold): aa10-30 R5 (plain): aa73-93 

R3 (plain): aa31-52 R6a (bold): aa94-95 R6b (bold): 113-133 

Nucleotide sequence (SEQ ID NO:28) 

ATGg cggc eg C tTACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGT 
ACTACTTTGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTT 
CGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAG2VAGGT 
GCCATGAAG ACAGG CTGGGTCAAGTAC AAGG AC ACTTGGTACTACTTAGACG CTAAAG AAGGCG CC a tg caj^l 
acatcaaqqctaactctaaafcfccattqqtatca^ 

GGACGGAACAGGCTGGTACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAAaagttcatgtaCatg 
GTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCTAGGCTC^GCCAGTGACCACTGGCGTG 
GACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGCATCCTGCTGAGCCTCTTTCTCATCCC 
AAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATCCCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGC 
GTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCACTCCACTGGAGGCCCTGCTCTCTGACCTCTTCCGGG 
ACCCGGACCACTGTCGCCAGGCCTACTCTGTCTATGCCTTCATGATCAGTCTTGGGGGCTGCCTGGGCTACCT 
CCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCCCTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGC 
CTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCA 
CCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGCTGTCCATGCCGGGCCCGCTTGGCTTT 
CCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTC 
TTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACCTTCACGCTGTTTTACACGGATTTCGTGGGCGAGG 
GGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGGAGACACTATGATGAAGGCGTTCGGAT 
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GGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCTCTCTGGTCATGGACCGGCTGGTGCAG 
CGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCAGCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGT 
CCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCC 
CTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGTGTTCCTGCCCAAATACCGAGGGGACACTGGAGGT 
GCTAGC^GTGAGGACAGCCTGATGACCAGCTTCCTGCCAGGCCCTAAGCCTGGAGCTCCCTTCCCTAATGGAC 
ACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTCTGCGGGGCCTCTGCCTGTGAtGTCTC 
CGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCC 
ATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCC 
AGTCTGTCACTGCCTATATGGTGTCTG C CG CAGGC CTGGGTCTGGTCGCCATTTACTTTG CTACACAGGTAGT 
ATTTGACAAGAGCGACTTGGCCAAATACTCAGCGggtggacaccatcaccatcaccattaa 

Construct 2 - Coding sequence of P501«ura HIS (control) (veast strain SC333) 
Protein sequence (SEQ ID NO:29) 

MVTiG IGPVLG LVCVPLLGSA SDHWRGRYGR RRPFIWALSI* GILIiSLFLIP RAGWLAGLLC 60 

PDPRPLELAL IilLGVGIiLDF CGQVCFTPIjE ALLSDLFRDP DHCRQAYSVY AFMI SIjGGCL 120 

GYLLPAIDWD T S ALAP YLGT QEECLFGLLT L I FLTCVAAT LLVAEEAALG PTEPAEGLSA 180 

PSLSPHCCPC RARLAFRNLG ALLPRLHQLC CRMPRTLRRL FVAELCSWMA LMTFTLFYTD 240 

FVGEGLYQGV PRAEPGTEAR RHYDEGVRMG SLGLFLQCAI SLVFSLVMDR LVQRFGTRAV 300 

YLASVAAFPV AAGATCLSHS VAWTASAAL TGFTFSALQI LPYTLASIiYH REKQVFLPKY 360 

RGDTGGASSE DSLMTSFLPG PKPGAPFPNG HVGAGGSGLL PPPPALCGAS ACDVSVRWV 420 

GEPTEARWP GRGICLDLAI LDSAFLLSQV APSLFMGSIV QLSQSVTAYM VSAAGIiGI»VA 480 
IYFATQWFD KSDLAKYSAG GHHHHHH 507 

Nucleotide sequence (SEQ ID NO:30) 



atgGTGCTGG 


GCATTGGTCC 


AGTGCTGGGC 


CTGGTCTGTG 


TCCCGCTCCT 


AGGCTCAGCC 


60 


AGTGACCACT 


GGCGTGGACG 


CTATGGCCGC 


CGCCGGCCCT 


TCATCTGGGC 


ACTGTCCTTG 


120 


GGCATCCTGC 


TGAGCCTCTT 


TCTCATCCCA 


AGGGCCGGCT 


GGCTAGCAGG 


GCTGCTGTGC 


180 


CCGGATCCCA 


GGCCCCTGGA 


GCTGGCACTG 


CTCATCCTGG 


GCGTGGGGCT 


GCTGGACTTC 


240 


TGTGGCCAGG 


TGTGCTTCAC 


TCCACTGGAG 


GCCCTGCTCT 


CTGACCTCTT 


CCGGGACCCG 


300 


GACCACTGTC 


GCCAGGCCTA 


CTCTGTCTAT 


GCCTTCATGA 


TCAGTCTTGG 


GGGCTGCCTG 


360 


GGCTACCTCC 


TGCCTGCCAT 


TGACTGGGAC 


ACCAGTGCCC 


TGGCCCCCTA 


CCTGGGCACC 


420 


CAGGAGGAGT 


GCCTCTTTGG 


CCTGCTCACC 


CTCATCTTCC 


TCACCTGCGT 


AG CAGC CAC A 


480 


CTGCTGGTGG 


CTGAGGAGGC 


AGCGCTGGGC 


CCCACCGAGC 


CAGCAGAAGG 


GCTGTCGGCC 


540 


CCCTCCTTGT 


CGCCCCACTG 


CTGTCCATGC 


CGGGCCCGCT 


TGGCTTTCCG 


GAACCTGGGC 


600 


GCCCTGCTTC 


CCCGGCTGCA 


CCAGCTGTGC 


TGCCGCATGC 


CCCGCACCCT 


GCGCCGGCTC 


660 


TTCGTGGCTG 


AGCTGTGCAG 


CTGGATGGCA 


CTCATGACCT 


TCACGCTGTT 


TTACACGGAT 


720 


TTCGTGGGCG 


AGGGGCTGTA 


CCAGGGCGTG 


CCCAGAGCTG 


AGCCGGGCAC 


CGAGGCCCGG 


780 


AGACACTATG 


ATGAAGGCGT 


TCGGATGGGC 


AGCCTGGGGC 


TGTTCCTGCA 


GTGCGC CATC 


840 


TCCCTGGTCT 


TCTCTCTGGT 


CATGGACCGG 


CTGGTGCAGC 


GATTCGG CAC 


TCGAGCAGTC 


900 
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TATTTGGCCA GTGTGGCAGC f "TTTCCCTGTG GCTGCCGGTG CCACATGCCT GTCCCACAGT 960 
GTGGCCGTGG TGACAGCTTCsAGCCGCCCTC ACCGGGTTCA CCTTCTCAGC CCTGCAGATC 1020 
CTGCCCTACA CACTGGCCTC CCTCTACCAC CGGGAGAAGC AGGTGTTCCT GCCCAAATAC 1080 
CGAGGGGACA CTGGAGGTGC TAGCAGTGAG GACAGCCTGA TGACCAGCTT CCTGCCAGGC 1140 
CCTAAGCCTG GAGCTCCCTT CCCTAATGGA CACGTGGGTG CTGGAGGCAG TGGCCTGCTC 1200 
CCACCTCCAC CCGCGCTCTG' CGGGGCCTCT GCCTGTGAtG TCTCCGTACG TGTGGTGGTG 1260 
GGTGAGCCCA CCGAGGCCAG GGTGGTTCCG GGCCGGGGCA TCTGCCTGGA CCTCGCCATC 1320 
CTGGATAGTG CCTTCCTGCT GTCCCAGGTG GCCCCATCCC TGTTTATGGG CTCCATTGTC 13 80 
CAGCTCAGCC AGTCTGTCAC TGCCTATATG GTGTCTGCCG CAGGCCTGGG TCTGGTCGCC 1440 
ATTTACTTTG CTACACAGGT AGTATTTGAC AAGAGCGACT TGGCCAAATA CTCAGCGggt 1500 
ggacaccatc accatcacca ttaa 1524 

Construct 3 - Coding sequence of natssP501 ^ PSQl^sa HIS (veast strain Y1800) 
Protein sequence (SEQ ID NO:31) 

Rl R2 

MAAVQRI 1 »™>v apt t t VMT .T.TFnT.FVCLAAA tYVHSDGSYf K DKFEKINGTWl 

R3 R4 ^ 

lYYFDSSGYMLADRWRKHTDGNWYWroNSGEIVLATGWKKIADK^ 

^ 

|YKPTWYYIJ3AKEGAl MnVTKA,N^^ 

MVLGIGPVLGLVCVPLLGSASDHWRGRYGRRRPFWAI^LG^^ 
ALLILGVGIXDFCGQVCFTPLFAIASDLFRDPDHCRQA 
YLGTQEECLFGLLTLIFLTCVAATLLVAEEAALGPTEPAEGI^APS 
IJHQLCCRMPRTXRRIfVAELCSWMALMTFTLFyTDFV 

SLGLFLQCAISLWSLVMDRLVQRFGTRAVYIASVAAFPVAAGATCLSHSVAVVTASAALTGFTFSA 
LQnJFYTLASLYHREKQVFIJKYRGDTGGA 

LCGASACDVSWVWGEPTEARVVPGRGICLDLAJLDSAFLl^QVAPSLmGSIVQI^QSVTAYMVS 
AAGLGLVAIYFATQWFDKSDLAKYSAGGHHHHHH 

R1 (plain): aa38-42 (fragment) R4 (bold): aa77-106 P2 (underline): 130-143 

R2 (bold): aa43-64 R5 (plain): aa1 07-1 26 

R3 (plain): aa65-76 R6a (bold): aa 127-1 28 R6b (bold): aa146-166 

nates stands for native signal sequence 
Nucleotide sequence (SEQ ID NO-.32) 

ATGgcGGCCGTGCAGAGGCTATGGGTATCGAGACTGCTAAGACACCGCAAAGCTCAGTTGTTGTTGGTTAACT 
TGTTGACCTrCGGGCTGGAAGTCTGTTTGGCggccgctTACGTACATTCCGACGGCTCTTATCCAAAAGACAA 
GTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAG 
CACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAAATGGCTACAGGCTGGAAGAAAATCGCTGATA 
AGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTT 
K ^,^^,^ g&an „ ngPf , = , t . rrf . aat - af , a t. c , aaaac taactctaaqttcattqqtatcactqaaggc g tcATG 
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GTATCAAATGCCTTT ATCCAGTCAG CGG ACGG AACAGGCTGGTACT ACCTCAAACC AGACGG AACACTGG CAG 
ACAGGCCAGAAaagttcatgtaCatgGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCT 
AGGCTCAGCCAGTGACCACTGGCGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGC 
ATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATCCCAGGCCCC 
TGGAGCTGGCACTGCT(^TCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTC^CTCCACTGGA 
GGCCCTGCTCTCTGACCTCTTCCGGGACCCGGACCACTGTCGCCAGGCCTACTCTGTCTATGCCTTCATGATC 
AGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACT 

GCACCCAGGAGGAGTGCCTCTTTGGCCTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGT 

GGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGC 

TGTCCATGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCC 

GCATGCCCCGCACCCTGCGCCGGCTCTTCGTGGCTGAGCTGTC 

GTITTAGACGGATTTCGTGGGCGAGGGGCTGTACCAGGGCGTGC^ 

AGAGIACTATGATGAAGGGGTTCGGATGGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCT 
CTCTGGTCATGGACCGGCTGGTGCAGCGATTCGGCACTCGAGCAGTCTATTTGGCCAGTGTGGCAGCTTTCCC 
TGTGGCTGCCGGTGCCACATGCCTGTCCCAC^GTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTC 
AC CTTCTCAGCCCTGCAGATCCTGC C CTACACACTGG CCTCCCTCTAC CAC CGGGAGAAGCAGGTGTTCCTGC 
CCT^AATACCGAGGGGACACTGGAGGTGCTAGCAGTGAGGACAGC CTG ATGACCAG CTTCCTGCCAGGCCCT AA 
GCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTC 
TGCGGGGCCTCTGCCTGTGAtGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTTCCGG 
GCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTT 
TATGGGCTCCATTGTCCAGCTCAGCCT^GTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGGGTCTGGTC 
GCCATTTACTTTGCTACACAGGTAGTATTTGACAAGAGCGACTTGGC CAAATACTCAGCGgg t gga caccatc 
accatcaccattaa 



Construct 4 - Coding sequence of alphapreCPC-P5Q1^^ HIS (veast strain Y18Q2) 
Protein sequence (SEQ ID NO:33) 

Alpha-pre signal Ri R2 R3 

MAARFP S I FTAVLF AAS 5 ALAAA |YVHSDGS YPKDKFEK INGTWYYFDS SGYfrnjADRWRKHTDGNVTYWFD| 

R4 R5 £g 

[NS GEMATGWKKI ADKWYYFNEEGAM KTGWVKYKDTWYYLDAKEGA|M0YI KANS KF I G I TEGV[MVSNAF l] 

R6 

|QSADGTGWYYIiKPd| gTLADRPEKFMYMVLGIGPVLGI^ 
LIPRAGWIAGLLCPDPRPLEIiALLILGVGLLDFCGQVCFTP 

GYLLPAIDWDTSAIiAPYLGTQEECIiFGLLTLIFLTCVAATLLVAEEAALGPTEPAEGLSAPSLSPH^ 
LAFRinJGALLPRLHQLCCRMPRTLRRLFVAEIjCSWMALMTFTIiFYTDFVGEGLYQGVP 

VRMGSLGIiFLQCAI S LVFSIiVMDRLVQRFGTRAVYL ASVAAFPVAAGATCLSHS VAWTAS AALTGFTFS ALQ 
I IiPYTLASIiYHREKQVFLPKYRGDTGGAS SEDSLMTS FL PGPKPGAPFPNGHVGAGGSGIiLPPPPALCGASAC 
DVS VRVWGEPTEARWPGRG I CLDLAILDS AFLLSQVAPS LFMGS I VQLSQS VTAYMVS AAGLGLVAI YFAT 
QWFDKSDLAKYSAGGHHHHHH 



65 



VB60013 




Alpha-pre signal (bold):raa4-22 



R1 (plain): aa24-28 (fragment) 



R4 (bold): aa72-91 
R5 (plain): aa92-112 
R6a (bold): aa113-114 



P2 (underline): 116-129 



R2 (bold): aa29-49 » 



R3 (plain): aa50-71 % ' 



R6b (bold): aa1 32-1 52 



Alphapre stands for alphapre signal sequence 
Nucleotide sequence (SEQ ID NO:34) 

TACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCAC^^ 

GTTCAGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGG 
CG AAATGG CT ACAGGCTG G AAG AAAATCGCTGAT AAGTGGT ACT ATTT CAACG AAGAAGGTGCCATG AAGAC A 
nnnrpnfinipr* & & rz*p kCAAGGAGft <*vi^G(yr a c?t a p*!* 1 ! 1 AG acg CTAAAGAAGGCGCCa t g ca.ata.ca.tca. agg eta. 
actctaaqttcattgqtatcactqaa qqcqtcATGGTATCAAATGCCTTTATCCAGTCAGCGGACGGAACAGG 
CTGGTACTACCTCAAACCAGACX3GAACACTGGCAGACAGGCCAGAA 

ATGgcGGCCAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCggccgctTACG 

TACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTC 

AGGCTATATGCTTGCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAA 

ATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCCATGAAGACAGGCT 

nnf3TPaaf3Tar^&r.naraf^TflGTAOTAgTTAGACGCTJUAGAAGGCGCCatq caatacatca 

taagttcattqqtatcactqaa qqcqtcATGGTATCAAATGCCTTTATCCAGTCAGCGGACGGAACAGGCTGG 

TACTACCTCAAACCAGACGGAACACTGGCAGACAGGCCAGAAgctggt at tac 1 1 aegtte caeca t tgt tgt 

tggaagttggtgttgaagaaaagttcatgtaCatgGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGT 

CCCGCTCCTAGGCTCAGCCAGTGACCACTGX3CGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTG 

TCCTTGGGCATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATC 

CCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCAC 

TCCACTGGAGGCCCTGCTCTCTGACCTCTTCCTCGACCCGGACCACTGTCGCCAGGCCTACTCTGTCT 

TCATGATCAGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCC 

CTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGCCTGCTCACCCTCATCTTCCTCACCTGCGTAGCAGCCACA 

CTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGC 

CCCACTGCTGTCCATGCCGXKaCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCT 

GTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTCTTCGTGXjCTGAGCTGTGCAGCTGGATGGCACTCATGACC 

TTC^CGCTGTTTTACACGX^TTTCGTGGGCGAGGGGCTC 

AGG C CCGGAG AC ACTATGATGAAGGCGTT CGGATGGGCAG CCTGGGGCTGTT C CTGC AGTGCG CCATCTC C CT 
GGTCTTCTCTCTGGTCATGGACCGX3CTGGTGCAGCGATTCGGCACTCGAGC^GTCTATTTGGCCAGTGTGG<^ 
GCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGTCCCACAGTGTGGCCGTGGTGACAGCTTCAGCCGCCCTCA 
CCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCCCTACACACTGGCCTCCCTCTACCACCGGGAGAAGCAGGT 
GTTCCTG CC C AAATAC CGAGGGG ACACTGGAGGTGCTAGC AGTG AGGACAGCCTG ATGACCAGCTTCCTGC CA 
GGCCCTAAGCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCAC 
CCGCGCTCTGCGGGGCCTCTGCCTGTGAtGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGT 
GGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCA 
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TCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCCAGTCTGTCA.CTGCCTATATGGTGTCTGCCGCAGGCCTGG 

GTCTGGTCGCCATTTACTTTGCT^CACAGGTAGTATTTGACAAGAGCGACCT 

acaccatcaccatcaccattaa^: 

Construct 5 - Coding sequence of a1phaprepro-P501 R i A ^ HIS (in plasmid dRIT 15068 and 
veast strain Y179Q) 

Protein sequence (SEQ ID NO:35) 



MSFLNFTAVL 


FAASSALAAP 


VNTTTEDETA 


QIPAEAVTGY 


SDLEGDFDVA 


VLPFSNSTNN 


60 


GliLFINTTIA 


S I AAKEEGVS 


LEKREAEAMV 


LGIGPVLGLV 


CVPLLGSASD 


HWRGRYGRRR 


120 


PFIWALSLGI 


LLSLFLIPRA 


GWIiAGLLCPD 


PRPLELALIil 


LGVGLLDFCG 


QVCFTPLEAL 


180 


IiSDLFRDPDH 


CRQAYSVYAF 


MISLGGCLGY 


LLPAIDWDTS 


ALAPYIiGTQE 


ECLFGLLTIiI 


240 


FLTCVAATLL 


VAEEAALGPT 


EPAEGLSAPS LSPHCCPCRA RLAFRNLGAL LPRLHQIjCCR 


300 


MPRTLRRLFV 


AELCSWMALM 


TFTLFYTDFV 


GEGLYQGVPR 


AEPGTEARRH 


YDEGVRMGSL 


360 


GLFLQCAISI* VFSLVMDRLV QRFGTRAVYIi 


AS VAAFPVAA 


GATCLSHSVA 


WTASAALTG 


420 


FTFSALQIIiP 


YTLAS LYHRE 


KQVFLPKYRG 


DTGGASSEDS 


LMTSFIiPGPK 


PGAPFPNGHV 


480 


GAGGSGLLPP 


PPAIiCGASAC 


DVSVRVWGE 


PTEARWPGR 


GICLDIiAILD 


SAFLLSQVAP 


540 


SLFMGSIVQIj 


SQSVTAYMVS 


AAGLGIiVAIY 


FATQWFDKS 


DLAKYSAGGH 


HHHHH 595 




Nucleotide sequence (SEQ ID NO:36) 










ATGAGTTTCC 


TCAATTTTAC 


TGCAGTTTTA 


TTCGCAGCAT 


CCTCCGCATT 


AGCTGCTCCA 


60 


GTCAACACTA 


CAACAGAAGA 


TGAAACGGCA 


CAAATTCCGG 


CTGAAG CTGT 


CATCGGTTAC 


120 


TCAGATTTAG 


AAGGGGATTT 


CGATGTTGCT 


GTTTTGCCAT 


TTTCCAACAG 


CACAAATAAC 


180 


GGGTTATTGT 


TTATAAATAC 


TAC TATTGCC 


AG CATTGCTG 


CTAAAGAAGA 


AGGGGTATCT 


240 


CTCGAGAAAA 


GAGAGGCTGA 


AGCCatgGTG 


CTGGGCATTG 


GTCCAGTGCT 


GGGCCTGGTC 


300 


TGTGTCCCGC 


TCCTAGGCTC 


AGCCAGTGAC 


CACTGGCGTG 


GACGCTATGG 


CCGCCGCCGG 


360 


CCCTTCATCT 


GGGCACTGTC 


CTTGGGCATC 


CTGCTGAGCC 


TCTTTCTCAT 


CCCAAGGGCC 


420 


GGCTGGCTAG 


CAGGGCTGCT 


GTGCCCGGAT 


CCCAGGCCCC 


TGGAGCTGGC 


ACTGCTCATC 


480 


CTGGGCGTGG 


GGCTGCTGGA 


CTTCTGTGGC 


CAGGTGTGCT 


TCACTCCACT 


GGAGGCCCTG 


540 


CTCTCTGACC 


TCTTCCGGGA 


CCCGGACCAC 


TGTCGCCAGG 


CCTACTCTGT 


CTATGCCTTC 


600 


ATGATCAGTC 


TTGGGGGCTG 


CCTGGGCTAC 


CTCCTGCCTG 


CCATTGACTG 


GGACACCAGT 


660 


GCCCTGGCCC 


CCTACCTGGG 


CACCCAGGAG 


GAGTGCCTCT 


TTGGCCTGCT 


CACCCTCATC 


720 


TTCCTCACCT 


GCGTAG CAGC 


CACACTGCTG 


GTGGCTGAGG 


AGGCAGCGCT 


GGGCCCCACC 


780 


GAGCCAGCAG 


AAGGGCTGTC 


GGCCCCCTCC 


TTGTCGCCCC 


ACTG CTGTCC 


ATGCCGGGCC 


840 


CGCTTGGCTT 


TCCGGAACCT 


GGGCGCCCTG 


CTTCCCCGGC 


TGCACCAGCT 


GTGCTGCCGC 


900 


ATGCCCCGCA 


CCCTGCGCCG 


GCTCTTCGTG 


GCTGAGCTGT 


GCAGCTGGAT 


GGCACTCATG 


960 


ACCTTCACGC 


TGTTTTACAC 


GGATTTCGTG 


GGCGAGGGGC 


TGTACCAGGG 


CGTGCCCAGA 


1020 


GCTGAGCCGG 


GCACCGAGGC 


CCGGAGACAC 


TATGATGAAG 


GCGTTCGGAT 


GGGCAGCCTG 


1080 


GGGCTGTTCC 


TGCAGTGCGC 


CATCTCCCTG 


GTCTTCTCTC 


TGGTCATGGA 


CCGGCTGGTG 


1140 
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CAGCGATTCG 


GCACTCGAGC 


AGTCTATTTG 


GCCAGTGTGG 


CAGCTTTCCC 


TGTGGCTGCC 


1200 


GGTGCCACAT 


GCCTGTCCCA 


CAGTGTGGCC 


GTGGTGACAG 


CTTCAGCCGC 


CCTCACCGGG 


1260 


TTCACCTTCT 


CAGCCCTGCA 


GATCCTGCCC 


TACACACTGG 


CCTCCCTCTA 


CCACCGGGAG 


1320 


AAGCAGGTGT 


TCCTGCCCAA 


ATACCGAGGG 


GACACTGGAG 


GTGCTAGCAG 


TGAGGACAGC 


1380 


CTGATGACCA GCTTCCTGCC 


AGGCCCTAAG 


CCTGGAGCTC 


CCTTCCCTAA TGGACACGTG 


1440 


GGTGCTGGAG 


GCAGTGGCCT 


GCTCCCACCT 


CCACCCGCGC 


TCTGCGGGGC 


CTCTGCCTGT 


1500 


GAtGTCTCCG 


TACGTGTGGT 


GGTGGGTGAG 


CCCACCGAGG 


CCAGGGTGGT 


TCCGGGCCGG 


1560 


GGCATCTGCC 


TGGACCTCGC 


CATC CTGG AT 


AGTGCCTTCC 


TGCTGTCCCA 


GGTGGCCCCA 


1620 


TCCCTGTTTA TGGGCTCCAT 


TGTCCAGCTC 


AGCCAGTCTG 


TCACTGCCTA 


TATGGTGTCT 


1680 


GCCGCAGGCC 


TGGGTCTGGT 


CGCCATTTAC 


TTTG CTACAC 


AGGTAGTATT 


TGACAAGAGC 


1740 


GACTTGGCCA 


AATACTCAGC 


Gggtggacac 


catcaccatc 


accattaa 


1788 
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Figure 3. Structure of CPC-p501 His fusion protein expressed in S. cerevisiae 
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Figure 4. Primary structure of CPC-P501 His fusion protein 

MAAAYVHSDG SYPKDKFEKI NGTWYYFDSS GYMLADRWRK HTDGNWYWFD NSGEMATGWK 60 

KIADKWYYFN EEGAMKTGWV KYKDTWYYLD AKEGAMQYIK ANSKFIGITE GVMVSNAFIQ 120 

SADGTGWYYL KPDGTIiADRP EKFMYMVLGI GPVLGLVCVP LLGSASDHWR GRYGRRRPFI 180 

WALSLGILLS LFLIPRAGWL AGLLCPDPRP LELALLILGV GLLDFCGQVC FTPLEALLSD 24 0 

IiFRDPDHCRQ AYS VYAFMI S LGGCLGYLLP AIDWDTSALA PYLGTQEECL FGLLTLIFLT 3 00 

CVAATLLVAE EAALGPTEPA EGLSAPSLSP HCCPCRARLA FRNLGALLPR LHQLCCRMPR 360 

TLRRLFVAEL CSWMALMTFT LFYTDFVGEG LYQGVPRAEP GTEARRHYDE GVRMGSLGLF 420 

LQCAISLVFS IiVMDRLVQRF GT RAVYL AS V AAFPVAAGAT CLSHSVAWT ASAALTGFTF 4 80 

SAIiQILPYTL ASLYHREKQV FLPKYRGDTG GASSEDSLMT SFLPGPKPGA PFPNGHVGAG 540 

GSGLLPPPPA LCGASACDVS VRVWGE PTE ARWPGRG I C liDLAILDSAF LLSQVAPSLF 600 
MGSIVQLSQS VTAYMVSAAG LGLVAIYFAT QWFDKSDLA KYSAGGHHHH HH 652 
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Figure 5. Nucleotide sequence of CPC P501 His(pRIT15201) 



ATGGCGGCCG 


CTTACGTACA* TTCCGACGGC 


TCTTATCCAA 


AAGACAAGTT 


TGAGAAAATC 


60 


AATGGCACTT 


GGTACTACTT 


TGACAGTTCA 


GGCTATATGC 


TTGCAGACCG 


CTGGAGGAAG 


120 


CACACAGACG 


GCAACTGGTA 


CTGGTTCGAC 


AACTCAGGCG 


AAATGGCTAC 


AGGCTGGAAG 


180 


AAAATCGCTG 


ATAAGTGGTA 


CTATTTCAAC 


GAAGAAGGTG 


CCATGAAGAC 


AGGCTGGGTC 


240 


AAGTACAAGG 


ACACTTGGTA 


CTACTTAGAC 


GCTAAAGAAG 


GCGCCATGCA ATACATCAAG 


300 


GCTAACTCTA 


AGTTCATTGG 


TATCACTGAA 


GGCGTCATGG 


TATCAAATGC 


CTTTATCCAG 


360 


TCAGCGGACG 


GAACAGGCTG 


GTACTACCTC 


AAACCAGACG 


GAACACTGGC 


AG ACAGGC CA 


420 


GAAAAGTTCA 


TGTACATGGT 


GCTGGGCATT 


GGTCCAGTGC 


TGGGCCTGGT 


CTGTGTCCCG 


480 


CTCCTAGGCT. 


CAGCCAGTGA 


CCACTGGCGT 


GGACGCTATG 


GCCGCCGCCG 


GCCCTTCATC 


540 


TGGGCACTGT 


CCTTGGGCAT 


CCTGCTGAGC 


CTCTTTCTCA 


TCCCAAGGGC 


CGGCTGGCTA 


600 


GCAGGGCTGC 


TGTGCCCGGA 


TCCCAGGCCC 


CTGGAGCTGG 


CACTGCT CAT 


CCTGGGCGTG 


660 


GGGCTGCTGG 


ACTTCTGTGG 


CCAGGTGTGC 


TTCACTCCAC 


TGGAGGCCCT 


GCTCTCTGAC 


720 


CTCTTCCGGG 


ACCCGGACCA 


CTGTCG CCAG 


GCCTACTCTG 


TCTATGCCTT 


CATG AT CAGT 


780 


CTTGGGGGCT 


GCCTGGGCTA 


CCTCCTGCCT 


GCCATTGACT 


GGGACACCAG 


TGCCCTGGCC 


840 


CCCTACCTGG 


GCACCCAGGA 


GGAGTGCCTC 


TTTGGCCTGC 


TCACCCTCAT 


CTTCCTCACC 


900 


TG CGTAGC AG 


CCACACTGCT 


GGTGGCTGAG 


GAGGCAGCGC 


TGGGCCCCAC 


CGAGCCAGCA 


960 


GAAGGGCTGT 


CGGCCCCCTC 


CTTGTCGCCC 


CACTGCTGTC 


CATGCCGGGC 


CCGCTTGGCT 


1020 


TTCCGGAACC 


TGGGCGCCCT 


GCTTCCCCGG 


CTGCACCAGC 


TGTGCTGCCG 


CATGCCCCGC 


1080 


ACCCTGCGCC 


GGCTCTTCGT 


GGCTGAGCTG 


TGCAGCTGGA 


TGG CAC T CAT 


GACCTTCACG 


1140 


CTGTTTTACA 


CGGATTTCGT 


GGGCGAGGGG 


CTGTAC CAGG 


GCGTGCCCAG 


AGCTGAGCCG 


1200 


GGCACCGAGG 


CCCGGAGACA 


CTATGATGAA 


GGCGTTCGGA 


TGGGCAGCCT 


GGGGCTGTTC 


1260 


CTGCAGTGCG 


CCATCTCCCT 


GGTCTTCTCT 


CTGGTCATGG 


ACCGGCTGGT 


GCAGCGATTC 


1320 


GGCACTCGAG 


CAGTCTATTT 


GGCCAGTGTG 


GCAGCTTTCC 


CTGTGGCTGC CGGTGCCACA 


1380 


TGCCTGTCCC 


ACAGTGTGGC 


CGTGGTGACA 


GCTTCAGCCG 


CCCTCACCGG 


GTTCACCTTC 


1440 


TCAGCCCTGC 


AGATCCTGCC 


CTACACACTG 


GCCTCCCTCT 


ACCAC CGGGA 


GAAGCAGGTG 


1500 


TTCCTGCCCA 


AATACCGAGG 


GGACACTGGA 


GGTG CTAGCA 


GTGAGGACAG 


CCTGATGACC 


1560 


AGCTTCCTGC 


CAGGCC CTAA 


GCCTGGAGCT 


CCCTTCCCTA 


ATGGACACGT 


GGGTGCTGGA 


1620 


GGCAGTGGCC 


TGCTCCCACC 


TCCACCCGCG 


CTCTGCGGGG 


CCTCTGCCTG 


TGATGTCTCC 


1680 


GTACGTGTGG 


TGGTGGGTGA 


GCCCACCGAG 


GCCAGGGTGG 


TTCCGGGCCG 


GGGCATCTGC 


1740 


CTGGACCTCG 


CCATCCTGGA 


TAGTG CCTTC 


CTGCTGTCCC 


AGGTGGCCCC 


ATC CCTGTTT 


1800 


ATGGGCTCCA 


TTGTCCAGCT 


CAGCCAGTCT 


GTCACTGCCT 


ATATGGTGTC 


TGCCGCAGGC 


1860 


CTGGGTCTGG 


TCG CCATTTA 


CTTTGCTACA 


CAGGTAGTAT 


TTGACAAGAG 


CGACTTGGCC 


1920 


AAATACTCAG 


CGGGTGGACA 


CCATCACCAT 


CACCATTAA 


1959 
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Figure 6. Cloning strategy for generation of plasmid pRIT 15201 



Ndel 

NCOI 
Sphl 



Hybridized oligos 
P21/P22 



5' catgcaatacatcaaggctaactctaagttcattggtatcactgaaggcgt 3 r 
3' gttatgtagttccgattgagattcaagtaaccatagtgacttccgcagtac 5' 



/-Noel 



NCOI 



pCUPl 




P501S 
aa55-aa553 



pBR327 



tARG3 



PCR amplification using 
CLYTANOTATG 



LEU2 




Ncol digestion 

C-lytA_P2_C-lytA 
Ndel 

(NCOI) 
(NCOI) 
Sphl 



atccaaaagacaag 



Ncol 
Digestion 



=5 ' aaaaccatggcggccgcttacgt^fattccgacc: 
3' 

CLYTA-aa55 =5' aaacatgtacatgaacttttctggcctgtctgccagtgttc 3' 



Ncol + AFLHI 
Digestion 



-NCOI C-lytA_P2__C-1ytA-P501aa51- 
-<NCOI) 



pCUPl 





pBR327 



NC01 

C-lytA_P2_C-lytA 



P501S 

aa51-aa553 HIS 



tARG3 



LEU2 
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Figure 7. Plasmid map of pRiT15201 



3 




HIS 



LEU2 
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Figure 8B. 




CP2C-P501S 



1 2 3 4 S £ 7 B 9 10 11 12 13 



1 - Molecular Weight Marker ( Biolabs - Grow Range)175; 83; 62; 47.5; 32.5; 25; 16.5; 6.5 kD - 10 

2 - Purified Reference CP2CP501S/12 135 ng 

3 - Purified Reference CP2CP501S/12 67.8 ng 

4 - Purified Reference CP2CP501S/12 33.9 ng 

5 - Purified Reference CP2CP501S/12 16.9 ng 

6 - Fermentation PROl 19-21h30 

7 - Fermentation PRO124-21h30 

8 - Fermentation PRO124-22h30 

9 - Fermentation PROl 27-0 h 

10 - Fermentation PRO 127-4 h 

1 1 - Fermentation PROl 27-6 h 

12 - Fermentation PRO127-22h20 

13 - Fermentation PRO!27-22h45 
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Figure 9. Purification of CPC-P501-His produced by Y1796. 



S. Cerevisiae cells 



Dyno-mill disruption 



OD 120/2 passes / 20 mM Tris pH 8.5 - 5 mM EDTA 



Centrifugation 



12.000 g / RT / 90 min (supernatant disrarded) 



Pellet washing step 1 



20 mM Tris pH 8.5 - 0.15 M NaCl - 2.0 M Guanidine.HCI ■ 
0.1% Empigen (30 min / RT) 



Centrifugation 



12.000 g / RT / 60 min (supernatant discarded)^ 



Pellet washing step 2 



20 mM Tris pH 8.5 - 0.15 M NaCl - 4 .0 M U re a 



Centrifugation 



12.000 g / RT / 30 min (supernatant discarded) 



Solubiiisation / Reduction 



20 mM Tris pH 8.5 - 0.15 M NaCl - 8.0 M Urea - 1% SDS • 
0.2 M Glutathion (60 min / RT) 



Centrifugation 



12.000 g / RT / 30 min (pellet discarded) 



Carbamidomethylation 



0.3 M lodoacetamide (30 min / RT / in the dark) / pH j 
adjusted to 8.5 (with 5 M NaOH solution) before ■ incubation _ J 



R/C Supernatant 



10-fold dilution and 
pH adjustment (8.5) 



Dilution buffer: 20 mM Tris pH 8.5 - 1 M NaCl - 8.0 M Urea ; 



Immobilised metal ion affinity 
chromatography on 

Ni^-Chelating Sepharose FF 

(Amersham) 

(10x25 cm column - 2000 ml) 



Equilibration i buffer : 20 mM Tris pH 8.5 - 0.9 M NaCl - 8.0 M i 



Urea -0.1% SDS 
Washing buffers: 



1) Equilibration buffer 

2) 20 mM Tris pH 8.5 - 0.15 M NaCl - 8.0 M Urea - 0.1% 
SDS 

3) 20 mM Tris pH 8.5 - 8.0 M Urea - 0.1% Tween 80 
Elution buffer: 20 mM Tris pH 8.5 - 8.0 M U rea - 0.1 % Jween 
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i 80 - 0.5 M Imidazole 



2-fold dilution and 
pH adjustment (10.0) 



120 "mM I Piper^'ne pH 10.0 - 8.0 M Urea - 0.1% Tween 80 



i Equilibration buffer 20 mM Piperazine pH 10.0 - 8.0 M Urea 
i -0.1% Tween 80 
i Washing buffers : 
i 1) Equilibration buffer 
(2,6 x 6.5 cm column - 35 ml) j 2 ) 20 mM Tris pH 8.5 - 8.0 M Urea - 0.1% Tween 80 

i Elution buffer : 20 mM Tris pH 7.5 - 8.0 M Urea - 0.1% 



Anion exchange 
chromatography on Q 
Sepharose FF 

(Amersham) 



• 4/ : : 


i Concentration/Diafiltration 


• +/- 3-fold concentration 


» 
• 


i (Pall - Omega 10 kDa - 200 cm 2 ) 


! Diafiltration buffer. Tris 20 mM pH 7.5 






Sterile filtration 




• 


i (Millipore - Millex GV 0.22pm) 






i * 






Purified bulk 


i Final buffer: 20 mM Tris pH 7.5 - +/- 0.3% Tween 80 


• 






i 


i Storage -20°C j ] 
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Figure 10. Pattern of CPC P501 His purified protein (4-12% Novex Nu-Page polyacrylamide 
precasted gels) 4 



1234 5 67 1234 




Coomassie Blue R250 Daiichi Silver Staining 




1: MW (250/150/75/50737/25/15/10 kDa) 
2: Purified bulk A (reducing conditions) 
3: Purified bulk B (reducing conditions) 
4: Purified bulk C (reducing conditions) 
5: Purified bulk A (non reducing conditions) 
6: Purified bulk B (non reducing conditions) 
7: Purified bulk C (non reducing conditions) 



Western Blot anti P501S 
(Monoclonal antibody) 
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Figure 11. Native full-length P501S sequence (SEQ ID NO:17) 



###### 

GCCACCATGGTCCAGAGGCTGTGGGTGAGCCGCCTGCTGCGGCACCGG 

MVQRLWVSRLLRHR 14 

AAAGCCCAGCTCTTGCK3GTCAACCTGCTAACCTTTGGCCTGGAGGTGTGTTTGGCCGCA 
KAQLLLVNLIiTFGLEVCLAA 34 

GGCATCACCTATGTGCCGCCTCTGCTGCTGGAAGTGGGGGTAGAGGAGAAGTTCATGACC 
OITYVPPLIiLEVGVBBKPMT 54 

ATGGTGCTGGGCATTGGTCCAGTGCTGGGCCTGGTCTGTGTCCCGCTCCTAGGCTCAGCC 
MVLGIGPVIiGLVCVPLLGSA 74 

AGTGACCACTGGCGTGGACGCTATGGCCGCCGCCGGCCCTTCATCTGGGCACTGTCCTTG 
SDHWRGRYGRRRPFIWALSL 94 

GGCATCCTGCTGAGCCTCTTTCTCATCCCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGC 
GIIiliSLFLIPRAGWIiAGLLC 114 

CCGGATCCCAGGCCCCTGGAGCTGGCACTGCTCATCCTGGGCGTGGGGCTGCTGGACTTC 
PDPRPIiELAIiIjI LGVGLLDF 134 

TGTGGCCAGGTGTGCTTCACTCCACTGGAGGCCCTGCTCTCTGACCTCTTCCGGGACCCG 
CGQVCFTPLEAIjLSDIjFRDP 154 

GACCACTGTCG CCAGG C CTACTC TGTCTATGCCTT CATGATCAGTCTTGGGGG CTGCCTG 
DHCRQAYSVYAFMISLGGCL 174 

GGCTACCTCCTGCCTGCCATTGACTGGGACACCAGTGCCCTGGCCCCCTACCTGGGCACC 
GYLLPAIDWDTSALAPYLGT 194 

CAGGAGG AGTGCCTCTTTGGC CTGCTCAC CCTCATCTTCCTCACCTGCGTAGCAGCCACA 
QEECLFGLLTLIFLTCVAAT 214 

CTGCTGGTGGCTGAGGAGGCAGCGCTGGGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCC 
LLVAEEAALGPTEPAEGLSA 234 

CCCTCCTTGTCGCCCCACTGCTGTCCATGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGC 
PSLS PHCCPCRARLAFRNLG 254 

GCCCTGCTTCCCCGGCTGCACCAGCTGTGCTGCCGCATGCCCCGCACCCTGCGCCGGCTC 
ALLPRLHQLCCRMPRTLRRL 274 

TTCGTGGCTGAGCTGTGCAGCTGGATGGCACTCATGACCTTCACGCTGTTTTACACGGAT 
FVAELCSWMALMTFTLFYTD 294 

TTCGTGGGCGAGGGGCTGTACCAGGGCGTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGG 
FVGEGLYQGVPRAEPGTEAR 314 

AGACACTATGATGAAGGCGTTCGGATGGGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATC 
RHYDEGVRMGSLGLFLQCAI 334 

TCCCTGGTCTTCTCTCTGGTCATGGACCGGCTGGTGCAGCGATTCGGCACTCGAGCAGTC 
SLVFS LVMDRLVQRFGTRAV 354 

TATTTGGCCAGTGTGGCAGCTTTCCCTGTGGCTGCCGGTGCCACATGCCTGTCCCACAGT 
YLASVAAFPVAAGATCLSHS 374 
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^GTGGCCGTGGTGACAGCTTCAGCCGCCCTCACCGGGTTCACCTTCTCAGCCCTGCAGATC 

V A V V T A S AALTGFTF S A L Q I 394 

CTG C CCTACACACTGG CCTCCCTCTACCACCGGGAG AAGCAGGTGTTCCTGCC CAAATAC 
LPYTLAS LYHREKQVFLPKY 414 

CGAGGGGACACTGGAGGTGCTAGCAGTGAGGACAGCCTGATGACCAGCTTCCTGCCAGGC 
RGDTGGAS SEDSLMTS FLPG 434 

CCTAAGCCTGGAGCTCCCTTCCCTAATGGACACGTGGGTGCTGGAGGCAGTGGCCTGCTC 
PKPGAPFPNGHVGAGGSGLL 454 

CCACCTCCACCCGCGCTCTGCGGGGCCTCTGCCTGTGATGTCTCCGTACGTGTGGTGGTG 
PPPPALCGASACDVSVRVVV 474 

GGTGAGCCCACCGAGGCCAGGGTGGTTCCGGGCCGGGGCATCTGCCTGGACCTCGCCATC 
GEPTEARVVPGRG I CLDLAI 494 

CTGGATAGTGCCTTCCTGCTGTCCCAGGTGGCCCCATCCCTGTTTATGGGCTCCATTGTC 
LDSAFLLSQVAPSLFMGSIV 514 

CAGCTCAGCCAGTCTGTCACTGCCTATATGGTGTCTGCCGCAGGCCTGGGTCTGGTCGCC 
QLSQSVTAYMVSAAGLGIjVA 534 

ATTTACTTTGCTACACAGGTAGTATTTGACAAGAGCGACTTGGCCAAATACTCAGCGTAG 
IYFATQVVFDKSDLAKYSA* 554 

GTCGAG 
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Figure 12. Sequence of the CPC-P501S expression cassette of JNW735 (SEQ ID NO:18) 



###### 

GCCACCATGGCGGCCGCTTACGTACATTCCGACGGCTCTTATCCAAAA 

MAAAYVHSDGSYPK 14 

GACAAGTTTGAGAAAATCAATGGCACTTGGTACTACTTTGACAGTTCAGGCTATATGCTT 
DKFEKINGTWYYFDSSG YMIi 34 

GCAGACCGCTGGAGGAAGCACACAGACGGCAACTGGTACTGGTTCGACAACTCAGGCGAA 
ADRWRKHTDGNWYWFDNSGE 54 

ATGGCTACAGGCTGGAAGAAAATCGCTGATAAGTGGTACTATTTCAACGAAGAAGGTGCC 
MATGWKKIADKWYYFNEEG A 74 

ATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTTAGACGCTAAAGAAGGC 
MKTGWVKYKDTWYYLDAKE G 94 

GCCATGCAATACATCAAGGCTAACTCTAAGTTCATTGGTATCACTGAAGGCGTCATGGTA 

A M |Q Y IKANSKFIGITEj GVMV 114 

TCAAATGCCTTTATCCAGTCAGCGGACGGAACAGGCTGGTACTACCTCAAACCIAGACGGA 
SNAFIQ SAPGTGWYYLKPDG 134 

ACACTGGCAGACAGGCCAGAAAAGTTCATGTACATGGTGCTGGGCATTGGTCCAGTGCTG 

T L A D R P E , KFMYMVLGIGPVL 154 

GGCCTGGTCTGTGTCCCGCTCCTAGGCTCAGCCAGTGACCACTGGCGTGGACGCTATGGC 
GLVCVPL LGSASDHWRGRYG 174 

CGCCGCCGGCCCTTCATCTGGGCACTGTCCTTGGGCATCCTGCTGAGCCTCTTTCTCATC 
RRRPFIWALSLGIIiLSIiFIil 194 

CCAAGGGCCGGCTGGCTAGCAGGGCTGCTGTGCCCGGATCCCAGGCCCCTGGAGCTGGCA 
PRAGWLAGIiliCPDPRPLELA 214 

CTGCTCATCCTGGGCGTGGGGCTGCTGGACTTCTGTGGCCAGGTGTGCTTCACTCCACTG 
LLILGVGLLDFCG QVCFTPL 234 

GAGGCCCTGCTCTCTGACCTCTTCCGGGACCCGGACCACTGTCGCCAGGCCTACTCTGTC 
EALLSDLFRDPDHCRQAYSV 254 

TATGCCTTCATGATCAGTCTTGGGGGCTGCCTGGGCTACCTCCTGCCTGCCATTGACTGG 
YAFMI SLGGCt»GYI*LPAIDW 274 

GACACCAGTGCCCTGGCCCCCTACCTGGGCACCCAGGAGGAGTGCCTCTTTGGCCTGCTC 
DTSALAPYIiGTQEECLFGLL 294 

ACCCTCATCTTCCTCACCTGCGTAGCAGCCACACTGCTGGTGGCTGAGGAGGCAGCGCTG 
TLIFLTCVAATLLVAEEAAL 314 

GGCCCCACCGAGCCAGCAGAAGGGCTGTCGGCCCCCTCCTTGTCGCCCCACTGCTGTCCA 
GPTEPAEGLSAPSliSPHCCP 334 

TGCCGGGCCCGCTTGGCTTTCCGGAACCTGGGCGCCCTGCTTCCCCGGCTGCACCAGCTG 
CRARLAFRNLGALIiPRLHQL 354 

TGCTGCCGCATGCCCCGCACCCTGCGCCGGCTCTTCGTGGCTGAGCTGTGCAGCTGGATG 
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CCRMPRTIi^RRLFVAELCSWM 374 

GC ACTCATG ACCTTC ACG CTGTTTTACACGGATTTCGTGGGCGAGGGGCTGTAC C AGGG C 
ADMTFTLP. YTDFVGEGLYQG 394 

GTGCCCAGAGCTGAGCCGGGCACCGAGGCCCGGAGACACTATGATGAAGGCGTTCGGATG 
VPRAEPGTEARRHYDEGVRM 414 

GGCAGCCTGGGGCTGTTCCTGCAGTGCGCCATCTCCCTGGTCTTCTCTCTGGTCATGGAC 
GSLGLFLQCAISLVFSIiVMD 434 

CGGCTGGTGCAGCGATTCGGCACTCGAGC^GTCTATTTGGCCAGTGTGGCAGCTTTCCCT 
RLVQRFGTRAVYLASVAAFP 454 

GTGGCTGCCGGTGCCACATGCCTGTCCC^CAGTGTGGCCGTGGTGACAGCTTCAGCCGCC 
VAAGATCLS HS VAVVTASAA 474 

CTGACCGGGTTCACCTTCTCAGCCCTGCAGATCCTGCCCTACACACTGGCCTCCCTCTAC 

L T G F . -T- F s A L Q I L P Y T L A S L Y 494 

C AC CGGGAGAAGCAGGTGTTCCTGCC CAAATAC CGAGGGGACACTGGAGGTGCTAGCAGT 
HREKQVFLPKYRGDTGGASS 514 

GAGGACAGCCTGATGACCAGCTTCCTGCCAGGCCCTAAGCCTGGAGCTCCCTTCCCTAAT 
EDSIiMTSFLPGPKPGAPFPN 534 

GGACACGTGGGTGCTGGAGGCAGTGGCCTGCTCCCACCTCCACCCGCGCTCTGCGGGGCC 
GHVGAGGSGLIj ppPPAIiCGA 554 

TCTGCCTGTGATGTCTCCGTACGTGTGGTGGTGGGTGAGCCCACCGAGGCCAGGGTGGTT 
SACDVSVRVVVGEPTEARVV 574 

CCGGGCCGGGGCATCTGCCTGGACCTCGCCATCCTGGATAGTGCCTTCCTGCTGTCCCAG 
PGRGICLDLAILDSAFLLSQ 594 

GTGGCCCCATCCCTGTTTATGGGCTCCATTGTCCAGCTCAGCCAGTCTGTCACTGCCTAT 
VAPSLFMGS IVQLSQSVTAY 614 

ATGGTGTCTGCCGCAGGCCTGGGTCTGGTCGCCATTTACnTTGCTACACAGGTAGTATTT 
MVS AAGLGLVA I Y FATQVV F 634 

GAC AAG AGCG ACTTGGC CAAATACTCAG CGT AGGTCGAG 

DKSDLAKYSA* 645 
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Figure 13 



-Two codon optimised P501S sequences (SEQ ID NO:19-20) 



SEQ ID NO:19 

ATGGTGCAGCGGCTCTGGGTGAGCCGCCTCCTGCGGCATCGCAAGGCCCAGCTCCTGCTGGTGAATCTGCTCA 

CATTCGGCCTGGAGGTGTGCCTGGCCGCCGGCATCACCTACGTGCCCCCCCTCCTGCTGGAGGTGGGAGTCGA 

GGAGAAGTTCATGACCATGGTGCTGGGCATTGGGCCCGTCCTGGGCCTCGTGTGCGTGCCTCTCCTCGGCAGC 

GCTTCCGACCATTGGCGCGGCCGGTATGGCCGCAGGAGACCCTTCATCTGGGCTCTGAGTCTCGGCATCCTGC 

TGAGCCTGTTCCTGATCCCTCGGGCCGGCTGGCTGGCCGGGCTGCTGTGCCCCGATCCTCGGCCCCTGGAGCT 

GGCCCTGCTGATCCTCGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGCTTCACGCCCCTGGAGGCACTG 

CTGAGC<^CCTGTTCCGGGACCCCGACCATTGCCGCCAGGCGTACAGCGTGTACGCCTTCATGATCTCCCTGG 

GAGGCTGCCTGGGCTACCTGCTCCCCGCCATCGATTGGGACACCAGCGCACTCGCCCCCTATCTCGG7\ACACA 

GGAGGAATGCCTGTTCGGATTGTTGACGCTGIVTCTTCCTCACGTGCGTCGCGGCCACCCTGTTGGTGGCCGAG 

GAGGCCGCCCTGGGGCCCACCGAGCCGGCCGAGGGACTGAGCGCCCCGAGCCTGAGTCCACACTGCTGCCCTT 

GCCGGGCCCGCCTGGCCTTCCGTAATCTGGGCGCCCTCCTGCCTCGGCTCCATCAGCTGTGTTGCAGAATGCC 

TAGGACGCTGCGGCGCCTGTTCGTCGCTGAGTTGTGCTCCTGGATGGCTCTCATGACCTTCACCCTGTTTTAT 

ACGGACTTCGTCGGGGAGGGCCTGTACCAGGGGGTGCCGCGCGCCGAGCCCGGGACAGAGGCGCGCCGCCACT 

ACGACGAGGGAGTGCGTATGGGCTCCCTGGGCCTCTTCTTGCAGTGCGCCATCAGTCTGGTTTTCTCTCTGGT 

CATGGACAGGCTGGTGCAGCGCTTCGGAACCCGGGCGGTGTACCTGGCGAGCX3TGGCCGCCTTCCCCGTGGCT 

GCCGGCGCCACCTGCCTCTCTCACTCGGTGGCCGTGGTCACCGCCAGCGCCGCCCTGACCGGGTTCACCTTCT 

CTGCCCTGCAGATTCTGCCTTACACCCTGGCCAGCCTGTACCATCGCGAGAAACAGGTGTTTCTCCCCAAGTA 

CAGAGGCGACACCGGGGGCGCCTCCAGCGAGGACAGCCTCATGACCTCCTTCCTGCCTGGCCCCAAGCCCGGC 

GCCCCTTTCCCCAACGGGCACGTGGGCGCCGGCGGGAGTGGGCTCCTGCCCCCCCCTCCTGCGCTGTGCGGGG 

CCAGCGCCTGCGACGTGAGCGTGCGCGTGGTGGTGGGCGAGCCCACCGAGGCCCGCGTGGTGCCGGGCAGAGG 

CATTTGTCTGGACCTGGCCATCCTCGACTCCGCCTTCCTCCTCAGCCAGGTGGCCCCGTCCCTCTTCATGGGC 

TCTATCGTCCAGCTGTCTCAGAGCGTCACCGCTTACATGGTGTCCGCTGCTGGACTGGGCTTGGTGGCTATTT 

ATTTCGCCACCCAGGTGGTGTTCGACAAGAGCGACCTGGCCAAATACTCCGCCTGA 

SEQ ID NO:20 

ATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAAAGGCCCAGTTGCTGCTGGTGAACCTGCTGA 
CTTTCGGACTGGAGGTGTGCCTGGCTGCCGGGATCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGA 
GGAGAAGTTCATGACAATGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGT 
GCGTCCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGGGGATCCTGC 
TCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCT 
GGCCCTCCTGATCCTGGGCGTGGGCTTGTTGGACTTCTGCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTG 
CTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACGCCTTCATGATCAGTCTGG 
GGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTCA 
GGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCCTGCTGGTGGCCGAG 



82 




VB60013 



GAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGCCCCCATTGCTGCCCGT 

GCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCC 

TCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTAC 

ACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGCGCCATT 

ACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTCTGGTGTTCTCTCTGGT 

GATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGCGGCTTTCCCCGTCGCC 

GCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTGACCGGCTTCACCTTCA 

GTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTA 

CCGCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCCCGGCCCCAAGCCGGGG 

GCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCCCCCGCCCTGTGCGGCG 

CTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGG 

GATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGC 

AGTATCGTGC AGCTGAGCCAGAGCGTGAC CGC CTACATGGTGAG CG CCG C CGG CCTGGGGTTGGTGGC CATCT 

ACTTTGCC ACC CAGGTCGTGTTCGACAAGAGCGATCTCG CCAAGTATAG CGCCTGA 
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Figure 14 - Re-engineered codon optimised sequence 19 (SEQ ID NO:21) 

GACG GCTAGC GCCACCATGGTGCAGCGGCTCTGGGTGAGCCGCCTCCTGCGGCATCGCAAGGCCCAGCTCCTG 
CTGGTGAATCTGCTCACATTCGGCCTGGAGG^^^ 

TGGAGGTGGGAGTCGAGGAGAAGTTCATGACCATGGTGCTGGGCATTGGGCCCGTCCTGGGCCTCGTGTGCGT 
GCCTCTCCTCGGCAGCGCTTCCGACCATTGGCGCGGCCG 

AGTCTCGGCATCCTGCTGAGCCTGTTCCTGATCCCTCGGGCCGGCTGGCTGGCCGGGCTGCTGTGCCCCGATC 
CTCGGCCCCTGGAGCTGGCCCTGCTGATCCTCGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGCTTCAC 

GCCCCTGGAGGCACTGCTGAGCGACCTGTTCCGGGAC^ 

TTCATGATCTCCCTGGGAGGCTGCCTGGGCTACCTGCTCCCCGCCATCGATTGGGACACCAGCGCACTCGCCC 

cctatctcggaacacaggaggaatgcctgttcgga^tgQtg 

CCTGTTGGTGG CCGAGGAGGCCGCCCTGGGGC C CACCGAGCCGGCCGAGGGACTGAGCG CCC CG AGCCTGAGT 
CCACACTGCTGCCCTTGCCGGGCCCGCCTGGCCTTCCGTAATCTGGGCGCCCTCCTGCCTCGGCTCCATCAGC 
TGTGTTGCAGAATGCCTAGGACGCTGCGGCGCCTGTTCGTCGCTGAGTTGTGCTCCTC 

CTTCACCCTGTTTTATACGGACTTCGTCGGGGAGGGCCTGTACCAGGGGGTGCCGCGCGCCGAGCCCGGGAC^ 
GAGGCGCGCCGCCACTACGACGAGGGAGTGCGTATGGGCTCCCTGGGCCTCTTCTTGCAGTGCGCCATCAGTC 
TGGTTTTCTCTCTGGTCATGGACAGGCTGGTGCAGCGCTTCGGAACCCGGGCGGTGTACCTGGCGAGCGTGGC 

CGCCTTCCCCGTGGCTGCCGGCGC^CCTO^ 

ACCGGGTTC^CCTTCTCTGCCCTGCAGATTCTGCCTTACACCCTGGCCAGCCTGTACC^TCGCGAGAT^C^GG 
TGTTTCTCCC CAAGTACAGAGG CGACAC CGGGG GCGCCTC CAGCGAGGAC AGCCTCATGACCTC CTTCCTGCC 
TGGCCCCAAGCCCGGCGCCCCTTTCCCCAACGGGCACGTGGGCGCCGGCGGGAGTGGGCTCCTGCCCCCCCCT 
CCTGCGCTGTGCGGGGCCAGCGCCTGCGACGTGAGCGTGCGCGTGGTGGTGGGCGAGCCCACCGAGGCCCGCG 
TGGTGCCGGGCAGAGGCATTTGTCTGGACCTGGCCATCCTCGACTCCGCCTTCCTCCTCAGCCAGGTGGCCCC 
GTCCCTCTTCATGGGCTCTATCGTCCAGCTGTCTCAGAGCGTCACCGCTTACATGGTGTCCGCTGCTGGACTG 
GGCTTGGTGGCTATTTATTTCGCCACCCAGGTGGTGTTCGACAAGAGCGACCTGGCCAAATACTCCGCCTGAC 

TCGAGGGAG 
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Figure 15 - Re-engineered-codon optimised sequence 20 (SEQ ID NO:22) 

GACG GCTAGC GCCACCATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAAAGGCCCAGTTGCTG 
CTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCGGGATCACGTACGTGCCCCCCCTGCTGC 
TGGAGGTGGGCGTGGAGGAGAAGTTCATGACAATGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGT 

GCCCCTCCTCGGGAGTGCGTCCGATC^TTGGC^^ 

AGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACC 

CCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGC§T<QrGGACTTCTGCGGCCAGGTGTGTTTCAC 

TCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACGCC 

TTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCC 

CCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCAC 

CCTGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGC 

CCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGC 

TGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCCT 

GTTCACCCTCTTCTACACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACC 

GAGGCTAGGCGCCATTACGACGAGGGCGTCAGG ATGGG CTCTCTGGG C CTCTTCCTGCAGTGCGCCATCAGTC 

TGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGC 

GGCTTTCCCCGTCGCCGCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTG 

ACCGGCTTCACCTTC^GTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGG 

TGTTCCTGC C CAAGTAC CGCGGGGAC ACAGGGGGAGCTTCCTCTGAGGACAGC CTG ATGACCAGCTTCTTGCC 

CGGCCCCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCC 

CCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGG 

TCGTGC CTGGC CGGGGGATCTGC CTGGACCTGGCCATCCTCG ACTCCGC CTTC CTG CTCTCCCAGGTGGCG CC 

CAGCCTGTTCATGGGC AGTAT CGTGCAGCTG AG CC AGAGCGTGACCGC CTACATGGTGAGCGCCGCCGG CCTG 

GGGTTGGTGGC CAT CTACTTTGCCACC C AGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGC CTGAC 

TCGAGGCAG 
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Figure 16 - The starting sequence for the optimisation of CPC (SEQ ID NO:23) 

Four amino acids of P501S sequence are boxed. 

ATGGCGGCCGCTTACGTACATTCCGACGGCTCTTATCCAAAAGACAAGTTTGAGAAAATCAATGGCACTTGGT 

ACTACTTTGACAGTTCAGGCTATATGCTTGC^GACCXSCT 

CGACAACTCAGGCGAAATGGCTAC^GGCTGGAAGAAAATCGCTGA^ 

GCCATGAAGACAGGCTGGGTCAAGTACAAGGACACTTGGTACTACTTAGACGCTAAAGAAGGCGCCATGCAAT 
ACATCAAGGCTAACTCTAAGTTCATTGGTATCACTGAAGGCGTCATGGTATCAAATGCCTTTATCCAGTCAGC 
GGACGGAACkGGCTGGTACTACCTGAAACCAGACGGAAC^ 

Figure 17 - Representative codon optimised CPC sequences (SEQ ID NO:24-25) 
SEQIDNO:24- " * ' 

ATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACAAGTTCGAGAAGATCAACGGGACATGGT 
ACTACTTCGACTCCTC CGGCTACATGCTCGC CGACCGCTGGCGGAAGC^ 

CGATAACTCGGGAGAGATGG CCAC CGGCTGGAAGAAGATCG CGGACAAGTGGTACTATTTCAACGAGGAGGGC 
GCCATGAAGACCGG CTGGGTG AAGTATAAGGACACCTGGTACTAC CTCGACGCCAAGGAGGGCGCCATGC AGT 
ATATCAAGGCCAACAGCAAGTTCATCGGCATCACCGAGGGAGTGATGGTCAGCA 

CGACGGC^CCGGATGGTACTACTTGAAGCCGGACGGCACCCTCGCGGATCGGCCCGAGAAGTTCATGTAC 

SEQ ID NO:25 

ATGGCCGCCGCCTACGTGCACAGCGACGGGTCCTACCCAAAGGACAAGTTCGAGAAGATCAACGGCACGTGGT 
ACTATTTCGACAGCAG CGGCTACATGCTCGCCGATCGCTGGCGCAAGCACAC CGACGGGAACTGGTACT.GGTT 
CGACAACTCTGGCGAGATGGCTACGGGGTGGAAGAAGATCGCCGACAAGTGGTACTACTTCAACGAGGAGGGC 
GCCATGAAGACCGGGTGGGTGAAGTACAAGGACACCTGGTACTACCTGGACX3CTAAGGAGGGCGCCATGCAGT 
ACATGAAGGCCAACTCGAAGTTCATCGGGATCACCGAGGGCGTGATGGTCAGTAACGCTTTCATCCAGAGCGC 
GGACGG C ACAGGCTGGTATTACCTGAAGC CCGATGGCACC CTGGCGGACAGACCTG AGAAATTCATGTAC 

Figure 18 - Engineered CPC codon optimised sequence (SEQ ID NO:26) 
SEQ ID NO:26 

G AC GGCTAGC GCCAG CATGG C CGC CGCCTAC GTGCATAGCGACGGGAG CTAC CCCAAGGACAAGTT CGAGAAG 
AT(^CGGGA<^TGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCGACCGCTGGCGGAAGCACACCGACG 
GCAACTGGTACTGGTTCGATAACTCGGGAGAGATGGCCACCGGCTGGAAGAAGATCGCGGACAAGTGGTACTA 
TTTCAACGAGGAGGGCGCCATGAAGACCGGCTGGGTGAAGTATAAGGACACCTGGTACTACCTCGACGCCAAG 
GAGGGCGCCATGCAGTATATCAAGGCCAACAGCAAGTTCATCGGCATCACCGAGGGAGTGATGGTCAGCAACG 
CCTTTATCC AGAG CG CCGACGG CACCGGATGGTACTACTTGAAG C CGGACGG CACCCTCG CGG ATCGG CCCGA 
G |&AGTTCATGTAC| TG ACTCGAG GCAG 
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Construct A = SEQ ID NO:37 

GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDG SYPKDK 
AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKINGTWYYPDS S GYMLAD 
ACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTTCGATAACTCGGGAGAGATGG 
RWRKHTD GNWYWFDNSGEMA 
CCACCGG CTGGAAGAAGATCGCGGACAAGTGGTACTATTT CAACGAGGAGGGCGCCATGA 
TGW KKI AD KWYYFN EEGAMK 
AGACCGGCTGGGTGAAGTATAAGGACACCTGGTACTACCTCGACGCCAAGGAGGGCGCCA 
TGWVKYKDTWYYIiDAKEGAM 
TGCAGTATATCAAGGCCAACAGCAAGTTCATCGGCATCACCGAGGGAGTGATGGTCAGCA 
QYIKANSKFIGITEGVMVSN 
ACGC CTTTATC CAGAGCGCCG ACGGCACCGGATGGTACTACTTGAAGCCGGACGG CACC C 
AFIQSADGTGWYYliKPDGTL 
TCGCGGATCGGCCCGAGAAGTTCATGTACATGGTGCTGGGCATCGGCCCCGTCCTGGGCC 
ADRPEKFMYMVLGI GPVLGL 
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TCGTGTGTGTGCCCCTCCTCGGGAGTGCGTCCGATCATTGGCGGGGCCGCTACGGCCGCC 
VCVPLLGSASDHWRGRYGRR 
GCAGACCGTTCATCTGGGCCCTGAGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCC 
RPFIWALSLGILLSLFLIPR 
GGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCTGGCCCTCC 
AGWLAGLLCPDPRPIiELAIiL 
TGATCCTGGGCGTGGGCCTGCTGGACT^CTGCGGCCAGGTGTGTTTCACTCCCCTGGAGG 
ILGVGIiLDFCGQVCPTPLEA 
CTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACG 
LIjSDLFRDPDHCRQAYSVYA 
CCTTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACA 
FMXS LGGCLGYI1I1PAIDWDT 
CCAGCGCCCTGGCCCCCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCT 
SAIjAPYLGTQEECIiFGIiLTIi 
TGATCTTCCTGACGTGCGTCGCCGCCACCCTGCTGGTGGCCGAGGAGGCGGCCCTGGGGC 
IFLTCVAATLLVAEEAALGP 
CCACCGAGCCCGCCGAGGGCCTGAGCGCTCCCAGCCTGAGCCCCCATTGCTGCCCGTGCA 
TEPAEGLSAPSLSPHCCPCR 
GGGCTAGGCTCGCCTTCAGGAATCTGGGCGCTTTGCTGCCCCGCCTGCATCAGCTGTGCT 
ARIiAFRNLGALLP RLHQLCC 
GTCGCATGCCTCGCACCCTGCGCCGCCTGTTCGTCGCTGAGCTCTGTTCCTGGATGGCCC 
RMPRTIiRRLFVAELCSWMAL 
TGATGACGTTCAGCCTCTTCTACACCGACTTCGTGGGGGAGGGCCTGTACCAGGGCGTGC 
MTFTLFYTDFVGEGLYQGVP 
CCAGGGCCGAGCCCGGCACCGAGGCTAGGCGCCATTACGACGAGGGCGTCAGGATGGGCT 
RAEPGTEARRHYDEGVRMGS 
CTCTGGGCCTCTTCCTGCAGTGCGCCATCAGTCTGGTGTTCTCTCTGGTGATGGACCGGC 
LGLFLQCAISLVFSLVMDRL 
TGGTGCAGCGCTTCGGCACCCGGGCCGTGTACCTCGCCTCTGTGGCGGCTTTCCCCGTCG 
VQRFGTRAVYLASVAAFPVA 
CCGCCGGCGCGACCTGCCTGTCTCATTCTGTCGCCGTGGTGACCGCCAGCGCCGCCCTGA 
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AGATCLSHSVAVVTASAALT 
CCGGCTTCACCTTCAGTGCGCTCCAGATTCTGCCCTACACCCTGGCGTCTCTGTACCATC 

G F T Pfi S A L Q I LPYTIjASLYHR 
GCGAGAAGCAGGTGTTCCTGCCCAAGTACCGCGGGGACACAGGGGGAGCTTCCTCTGAGG 
EKQVPLPKYRGDTGGASSED 
ACAGCCTGATGACCAGCTTCTTGCCCGGCCCCAAGCCGGGGGCCCCTTTCCCCAACGGCC 
SLMTSFLPGPKPGAPFPNGH 
ATGTCGGGGCGGGCGGCAGCGGCCTGCTCCCTCCCCCCCCCGCCCTGTGCGGCGCTAGTG 
VGAGGSGLIj p pppalcgasa 
CCTGCGACGTGAGCGTGCGGGTGGTGGTGGGGGAGCCCACCGAGGCTAGGGTCGTGCCTG 
CDVSVRVVVGEPTEARVVPG 
GCCGGGGGATCTGCCTGGACCTGGCCATCCTCGACTCCGCCTTCCTGCTCTCCCAGGTGG 
RGICIiDLAILDSAFLLSQVA 
CGCCCAGCCTGTTCATGGGCAGTATCGTGCAGCTGAGCCAGAGCGTGACCGCCTACATGG 
PSLFMGS IVQLSQSVTAYMV 
TGAGCXSCCGCCGGCCTGGGGTTGGTGGCC^TCTACTTTGCCACCCAGGTCGTGTTCGACA 
SAAGLGLVAIYFATQVVFDK 
AGAGCGATCTCGCC AAGTATAGCGC CTGAGGATC C 
SDLAKYSA* 

Construct B = SEQ ID NO:38 

GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDGSYPKDK 
AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKINGTWYYFD S SGYMLAD 
ACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTTCGATAACTCGGGAGAGATGG 
RWRKHTDGNWYWFDNSGEMA 
CCACCGGCTGGAAGAAGATCGCGGACAAGTGGTACTATTTCAACGAGGAGGGCGCCATGA 
TGWKKIADKWYYFNEEGAMK 
AGACCGGCTGGGTGAAGTATAAGGACACCTGGTACTACCTCGACGCCAAGGAGGGCGCCA 
TGWVKYKDTWYYIiDAKEGAM 
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TGC AGTATATC AAGGC CAACAG CAAGTTC ATCGGCATC ACCGAGGG AGTGATGGTCAGCA 
QYIKANSKFIGITEGVMVSN 



ACGCCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCC 
AFIQSADGTGWYYIiKPDGTL 
TCGCGGATCGGCCCGAGATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGCGCCATAGAA 
ADRPEMVQRLWVSRLLRHRK 
AGGCCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCG 
AQIiIiLVNLLiTFGIiEVCLAAG 
GGATCACGTACGTGCCCC CC CTGCTGCTGGAGGTGGG CGTGGAGGAGAAGTTCAT GACA A 
ITYVPPIiLIiEVGVEEKFMTM 
TGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGTGCGT 
VL.GIGPVIiGLVCVPLLGSAS 
CCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGG 
DHWRGRYGRRRPFIWALSLG 
GCATCCTGCTCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTC 
IIiIiSLFLIPRAGWIjAGLLCP 
CCGACCCCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGCCTGCTGGACTTCT 
DPRPLEI1AI1I1ILGVGLLDFC 
GCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCG 
GQVCFTPLEALLSDLFRDPD 
ACC^CTGTAGGCAGGCTTAGAGCGTGTACGCCTTCATGATCAGTCTGGGGGGATGCCTGG 
HCRQAYSVYAFMI SIiGGCLG 
GCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTC 
YIiIjPAIDWDTSALAPYLGTQ 
AGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCC 
EECLFGLIiTIjIFLTCVAATL 
TGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTC 
LVAEEAALGPTE paegl sap 
CCAGCCTGAGCCCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCG 
SLSPHCCPCRARLAFRNLGA 
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CTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGT 

LLPR.LHQLCCRMPRTLRRLF 

TCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTACACCGACT 
it ■ 

V A E L'lC S W MALM T F T L F YTD F 
TCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGC 

VGEGLYQGVPRAE PGTEARR 
GCCATTACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCA 

HYDEGVRMGSIiGLFLQCAIS 
GTCTGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTGGGCACCCGGGCCGTGT 

LVF S LVMDRLVQR FGTRAVY 
ACCTCGCCTCTGTGGCGGCTTTCCCCGTCGCCGCCGGCGCGACCTGCCTGTCTCATTCTG 

LASVAAF PVAAGATCLSHSV 
TCGCCGTGGTGACCGCC^GCGCCGCCCTGACCGGCTTGACCTTCAGTGCGCTCCAGATTC 

AVVTASAALTGFTFSALQIL 
TGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTACC 

PYTIjASLiYHREKQVFIjPKYR 
GCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGAC CAGCTTCTTG CC CGGCC 

GDTGGASSEDSLMTSFLPGP 
CCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCC 

KPGAPFPNGHVGAGGSGIiLP 
CTCCCCCCCCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGG 

PPPALCGASACDVSVRVVVG 
GGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGGGATCTGCCTGGACCTGGCCATCC 

EPTEARVVPGRGI C Xj D Ii a I l 
TCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGCAGTATCGTGC 

DSAFLLSQVAPSLFMGSIVQ 
AGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTGGGGTTGGTGGCCA 

LSQSVTAYMVSAAGLGLVAI 
TCTAC1TTGCGACCCAGGTCGTGTTCGACAAGAGCGATCTCGCCAAGTATAGCGCCTGAG 

YFATQVVFDKSDLAKYSA* 
GATCC 
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Construct C = SEQ ID NO:39 

GCGGCCGCGCCACCATGGCCGCCGCCTACGTGCATAGCGACGGGAGCTACCCCAAGGACA 
MAAAYVHSDGSYPKDK 
AGTTCGAGAAGATCAACGGGACATGGTACTACTTCGACTCCTCCGGCTACATGCTCGCCG 
FEKINGTWYYFDS S GYMLAD 
ACCGCTGGCGGAAGCACACCGACGGCAACTGGTACTGGTTCGATAACTCGGGAGAGATGG 
RWRKHTDGNWYWFDNSGEMA 
CCACCGGCTGGAAGAAGATCGCGGACAAGTGGTACTATTTCAACGAGGAGGGCGCCATGA 
TGWKKIADKWYYFNEEGAMK 
AGACCGGCTGGGTGAAGTATAAGGACACCTGGTACTACCTCGACGCCAAGGAGGGCGCCA 
TGWVKYKDTWYYLDAKEGAM 
TGCAGTATATCAAGGCCAACAG CAAGTTCATCGGCATCAC CGAGGGAGTG ATGGTCAGCA 
QYIKANSKFIGITEGVMVSN 
ACGCCTTTATCCAGAGCGCCGACGGCACCGGATGGTACTACTTGAAGCCGGACGGCACCC 
AFI.QSADGTGWYYLKPDGTL 
TCGCGGATCGGCCCGAG7VAGTTCATGTACATGGTGCTGGGCATCGGCCCCGTCCTGGGCC 
ADRPEKFMYMVLGIGPVIiGL 
TCGTGTGTGTGCCCCTCCTCGGGAGTGCGTCCGATCATTGGCGGGGCCGCTACGGCCGCC 
VCVPLLGSASDHWRGRYGRR 
GCAGACCGTTCATC^GGGCCCTGAGCCTGGGCATCCTGCTCTCTCTCTTCCTGATCCCCC 
RPFIWALSLGILIiSLFLIPR 
GGGCCGGCTGGCTGGCCGGCCTGCTGTGTCCCGACCCCCGCCCTCTGGAGCTGGCCCTCC 
AGWLAGLLCPDPRPIiELALL 
TGATCCTGGGCGTGGGCCTGCTGGACTTCTGCGGCCAGGTGTGTTTCACTCCCCTGGAGG 
ILGVGLLDFCGQVCFTPLEA 
CTCTGCTCTCCGACCTCTTCCGCGACCCCGACCACTGTAGGCAGGCTTACAGCGTGTACG 
LLSD IiFRDPDHCRQAYSVYA 
CCTTCATGATCAGTCTGGGGGGATGCCTGGGCTATCTGCTGCCCGCTATCGACTGGGACA 
FMISLGGCLGYLLPAIDWDT 
CCAGCGCCCTGGCCCCCTACCTGGGGACTCAGGAGGAGTGCCTGTTCGGCCTGCTCACCT 
SALAPYLGTQEE CLFGLLTL 
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^ CG C CC AGC CTGTTC ATGGGC AGTATCGTGC AG CTGAGCC AGAGCGTG ACCGCCTACATGG 

PSLFMGSIVQLSQSVTAYMV 
TGAGCGCCGCCGGCCTGGGGTTGGTGGCCATCTACTTTGCCACCCAGGTCGTGTTCGACA 

SAAGLGLVAIYPATQVVFDK 
AGAGCGATCTCGCCAAGTATAGCGCCATGGTGCAGCGGCTGTGGGTGTCCCGGCTGCTGC 

SDLAKYSAMVQRLWVSRLLR 
GCCATAGAAAGGCCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAG 

HRKAQLLIiVNLIiTFGLEVCI. 
TGGCTGCCGGGATCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGAGGAGTGAG 

-A-A-G- I — T Y- V - -P P L - LE-V GVBE * 
GATCC 

Construct D = SEQ ID NO:40 

G CGG CCGCGCCACC ATGGTG CAGCGGCTGTGGGTGTC CCGGCTGCTGCGC CATAGAAAGG 
MVQRLWVSRLLRHRKA 
CCCAGTTGCTGCTGGTGAACCTGCTGACTTTCGGACTGGAGGTGTGCCTGGCTGCCGGGA 
QLLI1VNLLTFGI1EVCLAAGI 
TCACGTACGTGCCCCCCCTGCTGCTGGAGGTGGGCGTGGAGGAGATGGCCGCCGCCTACG 
TYVPPLI*LEVGVEEMAAAYV 
TGCATAGCGACGGGAGCTACCCCAAGGACAAGTTCGAGAAGATCAACGGGACATGGTACT 
HSDGSYPKDKFEKINGTWYY 
ACTTCGACTCCTCCGGCTACATGCTCGCCGACCGCTGGCGGAAGCACACCGACGGCAACT 
FDSSGYMLADRWRKHTDGNW 
GGTACTGGTTCGATAACTCGGGAGAGATGGCCACCGGCTGGAAGAAGATCGCGGACAAGT 
YWFDNSGEMATGWKKIADKW 
GGTACTATTTCAACGAGGAGGGCGCCATGAAGACCGGCTGGGTGAAGTATAAGGACACCT 
YYFNEEGAMKTGWVKYKDTW 
GGTACTACCTCGACGCCAAGGAGGGCGCCATGCAGTATATCAAGGCCAACAGCAAGTTCA 
YYLDAKEGAMQYIKANSKFI 
TCGGCATCACCGAGGGAGTGATGGTCAGCAACGCCTTTATCCAGAGCGCCGACGGCACCG 
GITEGVMVSNAFIQSADGTG 
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GATGGTACTACTTGAAGCCGGACGGCACCCTCGCGGATCGGCCCGAGAAGTTCATGTACA 



TGGTGCTGGGCATCGGCCCCGTCCTGGGCCTCGTGTGTGTGCCCCTCCTCGGGAGTGCGT 



CCGATCATTGGCGGGGCCGCTACGGCCGCCGCAGACCGTTCATCTGGGCCCTGAGCCTGG 
DHWRGRYGRRRPFIWAliSLG 
GCATCCTGCTCTCTCTCTTCCTGATCCCCCGGGCCGGCTGGCTGGCCGGCCTGCTGTGTC 
ILLSIiFlilPRAGWLAGLIiCP 
CCGACCCCCGCCCTCTGGAGCTGGCCCTCCTGATCCTGGGCGTGGGCCTGCTGGACTTCT 
DPRPLEIiAliLIIiGVGIiliDFC 
GCGGCCAGGTGTGTTTCACTCCCCTGGAGGCTCTGCTCTCCGACCTCTTCCGCGACCCCG 
GQVCFTPLEALLSDLFRDPD 
ACCACTGTAGGCAGGCTTACAGCGTGTACGCCTTCATGATCAGTCTGGGGGGATGCCTGG 
HCRQAYSVYAFMISLGGCLG 
GCTATCTGCTGCCCGCTATCGACTGGGACACCAGCGCCCTGGCCCCCTACCTGGGGACTC 
YLLPAIDWDTSALAPYLGTQ 
AGGAGGAGTGCCTGTTCGGCCTGCTCACCTTGATCTTCCTGACGTGCGTCGCCGCCACCC 
EECLFGLLTLIFLTCVAATL 
TGCTGGTGGCCGAGGAGGCGGCCCTGGGGCCCACCGAGCCCGCCGAGGGCCTGAGCGCTC 
LVAEEAALGPTEPAEGLSAP 
CCAGCCTGAGCCCCCATTGCTGCCCGTGCAGGGCTAGGCTCGCCTTCAGGAATCTGGGCG 
SIiS PHCCPCRARLAFRNLGA 
CTTTGCTGCCCCGCCTGCATCAGCTGTGCTGTCGCATGCCTCGCACCCTGCGCCGCCTGT 
LLPRLHQLCCRMPRTLRRLF 
TCGTCGCTGAGCTCTGTTCCTGGATGGCCCTGATGACGTTCACCCTCTTCTACACCGACT 
VAELCSWMAIiMTFTLFYTDF 
TCGTGGGGGAGGGCCTGTACCAGGGCGTGCCCAGGGCCGAGCCCGGCACCGAGGCTAGGC 
VGEGLYQGVPRAEPGTEARR 
GCCATTACGACGAGGGCGTCAGGATGGGCTCTCTGGGCCTCTTCCTGCAGTGCGCCATCA 
HYDEGVRMGSIiGLFIiQCAIS 
GTCTGGTGTTCTCTCTGGTGATGGACCGGCTGGTGCAGCGCTTCGGCACCCGGGCCGTGT 




V L 



GI" GPVLGLVCVPLLGSAS 
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LVFSLVMDRLVQRFGTRAVY 
ACCTCGCCTCTGTGGCGGCTTTCCCCGTCGCCGCCGGCGCGACCTGCCTGTCTCATTCTG 

lasvaafpvaagatclshsv 

TCGCCGTGGTGACCGCCAGCGCCGCCCTGACCGGCTTCACCTTCAGTGCGCTCCAGATTC 
AVVTASAALTGFTFSAIiQI L 
TGCCCTACACCCTGGCGTCTCTGTACCATCGCGAGAAGCAGGTGTTCCTGCCCAAGTACC 
PYTLASLYHREKQVFIiPKYR 
GCGGGGACACAGGGGGAGCTTCCTCTGAGGACAGCCTGATGACCAGCTTCTTGCCCGGCC 
GDTGGASSEDSLMTSFLPGP 
CCAAGCCGGGGGCCCCTTTCCCCAACGGCCATGTCGGGGCGGGCGGCAGCGGCCTGCTCC 
KPGA PFPNGHVGAGGSGLIiP 
CTCCCCCCCCCGCCCTGTGCGGCGCTAGTGCCTGCGACGTGAGCGTGCGGGTGGTGGTGG 
PPPALCGASACDVSVRVVVG 
GGGAGCCCACCGAGGCTAGGGTCGTGCCTGGCCGGGGGATCTGCCTGGACCTGGCCATCC 
EPT EARVVPGRG I C L D L A I L 
TCGACTCCGCCTTCCTGCTCTCCCAGGTGGCGCCCAGCCTGTTCATGGGCAGTATCGTGC 
DSAFLIiSQVAPSLFMGSIVQ 
AGCTGAGCCAGAGCGTGACCGCCTACATGGTGAGCGCCGCCGGCCTGGGGTTGGTGGCCA 
LSQSVTAYMVSAAGLGLVAI 
TCTACTTTG CCACCCAGGTCGTGTTCGAC AAG AGCGATCTCGCC AAGTATAGCG C CTGAG 
YFATQVVFDKSDLAKYSA* 

GATCC 
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Figure 20 - Western blot "analysis of CHO cells following transient transfection with 
P501S (JNW680), CPC-P501S (JNW735) and empty vector control. 



1 2 3 4 5 




Lane Sample 

1 CPC-P501S (JNW735) 

2 CPC P501 S protein (62.5ng) 

3 P501S (JNW680) 

4 P501S (JNW680) 

5 Empty vector control 
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Figure 21 - Anti-P501S antibody responses following immunisation at dayO, 21 & 42 
with pVAC-P501S (JNW680, mice B1-9) or Empty vector (pVAC, mice A1-6). 
A pre-bleed was taken at day -1. Subsequently bleeds were taken at day 28 and day 49 
(mice A1-3, B1-3) and day 56 (mice A4-6, B4-9). All sera was tested at 1/100 dilution. The 
results for the pVAC immunised mice were averaged. The results for the individual pVAC- 
P501S immunised mice are shown. As a positive control, sera from Adeno-P501S 
immunised mice (Corixa Corp, diluted 1/100) is included. 
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Figure 22 - Peptide library screen using C57BL/6 mice immunised at day 0, 21, 42, 
and 70 with pVAC-P501S (JNW680). 

All peptides were used at a final concentration of SO^g/ml. Peptides 1-50 are overlapping 
15-20mers obtained from Corixa. Peptides 51-70 are predicted 8-9mer Kb and Db 
epitopes and were ordered from Mimotopes (UK). Samples 71-72 and 73-78 are DMSO 
controls and no peptide controls respectively. Graph A shows the IFN-y responses whilst 
Graph B shows the IL-2 responses. Peptides selected for use in subsequent 
immunoassays are shown in black. 




A: IFN-gamma responses 




Peptide number 



B: IL2 responses 




Peptide number 
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Figure 23 - Cellular responses by ELISPOT at day 77 following PWIID immunisation 
at day 0, 21, 42, and 70 with pVAC-P501S (JNW680, B6-9) and pVAC empty (A4-6). 

Peptide 18,22 & 48 were used at 50ug/ml. CPC-P501S protein was used at 20ug/ml. 
Graph A shows the IFN-y responses whilst Graph B shows the IL-2 responses. 
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Figure 24- Comparison of P501S and CPC-P501S. 

Cellular responses were measured by IL-2 ELISPOT using peptide 22 (lOjag/ml) at day 28. 
Mice were immunised by PMID at day 0 and 21 with pVAC empty (control), pVAC-P501S 
(JNW680) and CPC-P501S (JNW735). 




Figure 25 - Immune response (lymphoproliferation on spleen cells) following 
protein immunisation with CPC-P501S. 



mm 
f si 

V> .':-.m -4; 

2 ^ 
0 




; , Btiffer GPCp50,l ' ^^sis , ' AsijS ■ ^fr/; 



101 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
□'FADED TEXT OR DRAWING 



LI-BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




